Commit Graph

8031 Commits

Author SHA1 Message Date
Vaibhav Hemant Dixit
02b17839c3
Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)
Why I did it
Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator:

Not finding anything, and resumes to incorrectly migrate every missing config
This is not expected. migration should happen after the old config is loaded and only new schema changes need migration.
Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None.
The reason for incorrect call is that:

database service starts db_migrator as part of startup sequence.
config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service.
Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed.
Fixed by:

Check if this is first time boot by checking pending_config_migration flag.
If pending_config_migration is enabled, then do not call db_migrator as part of database service startup.
Let database service start which triggers config-setup service to start.
Now call db_migrator after when config-setup service loads old-config/minigraph
2023-05-30 10:16:21 -07:00
mssonicbld
e5b360d604 [submodule] Update submodule sonic-sairedis to the latest HEAD automatically 2023-05-30 16:32:40 +08:00
DavidZagury
e830491001
[system-health] When disabling a feature the SYSTEM_READY|SYSTEM_STATE was not updated (#14823)
- Why I did it
If you enable feature and then disable it, System Ready status change to Not Ready

A disabled feature should not affect the system ready status.

- How I did it
During the disable flow of dhcp_relay, it entered the dnsrvs_name list, which caused the SYSTEM_STATE key to be set to DOWN. Right after that, the dhcp_relay service was removed from the full service list, however, but, when it was removed from the dnsrvs_name, there was no flow to reset the system state back to UP even though there was no more services in down state.

- How to verify it
root@qa-eth-vt01-2-3700v:/home/admin# config feature state dhcp_relay enabled 
root@qa-eth-vt01-2-3700v:/home/admin# show system-health sysready-status 

root@qa-eth-vt01-2-3700v:/home/admin# config feature state dhcp_relay disabled
root@qa-eth-vt01-2-3700v:/home/admin# show system-health sysready-status 

Should see
System is ready
2023-05-30 10:37:33 +03:00
Vivek
6852fcdc24
[Mellanox] Facilitate automatic integration of sdk kernel patches (#14652)
#### Why I did it

Facilitate Automatic integration of sdk kernel patches into SONiC. 

**Inputs to the Script:**
1) `MLNX_SDK_VERSION` Eg: `4.5.4206`
2) `MLNX_SDK_ISSU_VERSION` Eg: `101` 
 **Note: If nothing is provided the one already present in the sdk.mk file is used**
3) `MLNX_SDK_SOURCE_BASE_URL:` 
 **Note: If nothing is provided the upstream sdk drivers url is used**
4) `CREATE_BRANCH: (y|n)` Creates a branch instead of a commit (optional, default: n) 
5) `BRANCH_SONIC`:  Only relevant when CREATE_BRANCH is y. `Default: master`. 

Note: These should be provided through `SONIC_OVERRIDE_BUILD_VARS ` parameter

**Output:**
1) Script creates a commit in sonic-linux-kernel with any updates to sdk-kernel patches in sonic in accordance with the version provided by  `MLNX_SDK_VERSION`

**Note: Script Doesn't commit anything to linux-kernel when there aren't any changes required..**  

#### How I did it

1) Added a new make target which can be invoked by calling `make integrate-mlnx-sdk`

```
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git rev-parse --abbrev-ref HEAD
master_6f38dca_integrate_4.5.4206

user@server:/sonic-buildimage/src/sonic-linux-kernel$ git log --oneline -n 1
d64d1e7 (HEAD -> master_6f38dca_integrate_4.5.4206) Intgerate MLNX SDK 4.5.4206 Kernel Patches
```

Changes made will be summarized under `sonic-buildimage/integrate-mlnx-sdk_user.out` file. Debugging and troubleshooting output is written to `sonic-buildimage/integrate-mlnx-sdk.log` files

[log_files.zip](https://github.com/sonic-net/sonic-buildimage/files/11226441/log_files.zip)


#### Limitations:
1) Assumes that the sdk kernel patches are always upstreamed

#### How to verify it

Build the Kernel and test
2023-05-29 22:24:06 -07:00
mssonicbld
220ea74cbb [submodule] Update submodule sonic-platform-common to the latest HEAD automatically 2023-05-29 16:32:27 +08:00
mssonicbld
105f47d38f
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15241) 2023-05-29 15:17:33 +08:00
qiwang4
359b80e012
[master]staticroutebfd process implementation (#13789)
* [BFD] staticroutebfd implementation
* To enable the BFD for static route

HLD: sonic-net/SONiC#1216
2023-05-26 16:32:05 -07:00
zitingguo-ms
a09048acd5
advance sairedis header to latest (#15227)
Advance sonic-sairedis header to include the following fix:

Remove return failure when SAI version mismatch sonic-sairedis#1248
2023-05-26 23:03:16 +08:00
Vivek
bc9c054da2
[healthd] Use unix_socket_path instead of loopback ip (#14843)
- Why I did it

interfaces-config service restarts networking service, which in-turn results in loopback interface address is being removed and reassigned back

If the system-health happens to start during that instance expections and logs like this are seen:

Apr 15 18:14:49.357869 r-panther-20 ERR healthd: update system status exception:Unable to connect to redis: Cannot assign requested address
Apr 15 18:14:49.429778 r-panther-20 ERR healthd: subscribe_statedb exited- Unable to connect to redis: Cannot assign requested address
Apr 15 18:14:52.218594 r-panther-20 ERR healthd: system_service_Map_base::at
Apr 15 18:14:52.219714 r-panther-20 ERR healthd: system_service_Map_base::at
Apr 15 18:14:55.218636 r-panther-20 ERR healthd: system_service_Map_base::at
Apr 15 18:14:55.218722 r-panther-20 ERR healthd: system_service_Map_base::at

- How I did it
use unix socket path

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-05-26 15:49:21 +03:00
Oleksandr Ivantsiv
f3ce9ebda8
[Mellanox] Update SAI to v2305.24.0.1 (#15208)
Why I did it
Align with SAI headers v1.12.0

Work item tracking
Microsoft ADO (number only):
How I did it
Update Mellanox SAI submodule

How to verify it
Compile SONiC image
2023-05-26 17:53:17 +08:00
mssonicbld
dd8b9f2502 [submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically 2023-05-26 16:32:38 +08:00
mssonicbld
82abb8b832
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15222) 2023-05-26 15:04:30 +08:00
mssonicbld
8148623eb6 [submodule] Update submodule sonic-sairedis to the latest HEAD automatically 2023-05-25 16:32:43 +08:00
mssonicbld
d1501a9496 [submodule] Update submodule sonic-mgmt-common to the latest HEAD automatically 2023-05-25 16:32:35 +08:00
mssonicbld
1c5d7c173e [submodule] Update submodule sonic-swss-common to the latest HEAD automatically 2023-05-25 16:32:30 +08:00
Vivek
d3f2d06117
[Mellanox] Add Copyright Headers for missing files (#15136)
Added NVIDIA copyright to missing files under platform/mellanox & device/mellanox
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-05-25 07:55:44 +03:00
Jing Kan
207e33ace8
[YANG] Add MgmtLeafRouter to Device Neighbor Metadata element type list (#15202)
Why I did it
Introduce a new valid neighbor element type to YANG.

Work item tracking
Microsoft ADO (number only): 23994521
How I did it
Add MgmtLeafRouter to element network type list.

How to verify it
Passes UTs
2023-05-24 21:02:24 -07:00
Pavan-Nokia
4d8f3cda35
[armhf][Nokia-7215]Add HWSKU files for new SAI (#15146)
* [armhf][Nokia-7215]Add HWSKU files for new SAI

Add new easy bringup (EZB) files for new SAI 1.11.0

* [Nokia][devicedata]Modified the port autoneg default setting for Nokia-7215 platform

[armhf][Nokia-7215]Update profile.ini
2023-05-24 21:01:40 -07:00
george-deng88
2b527d301f
[Celestica] Optimize Silverstone led init process (#14852)
Why I did it
Optimize Silverstone led init process, this linkscan = off can cause the sonic port link status async with bcm shell after reboot.

How I did it
Remove redundant code.

How to verify it
After reboot, the ports can linkup normally.
2023-05-24 15:50:12 -07:00
Guilt
a73d443c1d
[CI][doc][build] Trim src folder files trailing blanks (#15162)
- Run pre-commit tox profile to trim all trailing blanks
- Use several commits with a per-folder based strategy
  to ease their merge

Issue #15114

Signed-off-by: Guillaume Lambert <guillaume.lambert@orange.com>
2023-05-24 10:01:43 -07:00
Guilt
6745691eb5
[CI][doc][build] Trim script and sonic-slave-* folders files trailing blanks (#15161)
- run pre-commit tox profile to trim all trailing blanks
- use several commits with a per-folder based strategy
  to ease their merge

Issue #15114

Signed-off-by: Guillaume Lambert <guillaume.lambert@orange.com>
2023-05-24 09:25:12 -07:00
Liu Shilong
4467f43449
[ci] Remove debian mirror timestamp from sonic slave base tag (#15189)
showtag target didn't shows the correct target when setting MIRROR_SNAPSHOT options.

Microsoft ADO: 23982694
2023-05-24 09:22:12 -07:00
Junchao-Mellanox
18cf719d6a
[Mellanox] Use sysfs for sfp reset/LPM/presence (#14130)
- Why I did it
The current implementation of SFP reset, LPM, present relies on SDK API. This PR moves the implementation to SDK sysfs. By this PR, it gains following benefit:
1. SDK sysfs provides better performance.
2. Host side and container side share the same code.
3. Code is much cleaner.

- How I did it
Use SDK sysfs to implement SFP reset, LPM, present.

- How to verify it
1. Manual test.
2. Unit test.
2023-05-24 17:24:34 +03:00
Kebo Liu
3e9437b63e
[Mellanox] Update SAI to 2211.24.0.21 and SDK/FW to 4.5.5142/2010_5144 (#15072)
SDK/FW Fixed Issues:
• When a system has more than 256 ACL entries, on rare occasion, removing/adding entries may cause some ACL entries not to work.
• When using mirror session policer on spectrum-2, spectrum-3, the actual CIR was 1.28 times more than the configured CIR value
• After warm boot process, when enabling ECN marking and the port is in split mode, traffic sent to the port under congestion (for example, when connecting two ports with a total speed of 50GbE to a single 25GbE port) is not marked.
• Warm boot might fail if the key value SAI_KEY_ACCUMULATED_FLOW_COUNTER_UNITS_IN_KB is set
• If counters are bound to an next hop group, there is a probability the next API calls that modify the next-hop group members will fail.
• In Spectrum platforms Fastboot mode is not operational for Split port with Force mode in 50G speed
• When fine grain next hop group has a size of 2K or 4K members, and group is removed, FW will remove only (size % 2048) members, resulting in leakage of KVD resources
• When reading some port statistics, or bulk reading some Queue or PG statistics, and in parallel reading or writing other counters, FW may, in rare cases, get stuck
• SN2201 Module 1 is considered to be present/linked while no cable/module is plugged
• On Spectrum-3 when port configure to 400G FW might stuck after running mlxlink while 400G interface connected and swap between upper and lower 4 lanes

SAI New features:
• ACL: Added support for an ACL match on the AETH field (SAI_ACL_TABLE_ATTR_FIELD_AETH_SYNDROME, SAI_ACL_ENTRY_ATTR_FIELD_AETH_SYNDROME) to count RoCE NAK and CNP packets.
• PLL Status: Added a new logging entry that alerts the user upon a PLL lock loss event.
• Dual ToR - Additional MAC Address: Added support for setting a MAC address for the router interface which is not part of the 10 bit MAC address available for RIFs on Spectrum-1, as part of the Dual ToR scenario.
• Dual ToR: DSCP Remapping Added support for tunnel QoS maps as part of the Dual TOR scenario.

SAI Fixed issues:
• When setting a WRED profile attribute for a color that was not enabled during the profile create time, an error would be returned. After the fix, a default profile is create on such scenario and the set attribute is applied on top of it
• When calling the flush FDB by using the SAI_FDB_FLUSH_ATTR_BRIDGE_PORT_ID attribute, the bridge bv_id value was filled on the notification callback where it should have been left empty.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-05-24 17:20:33 +03:00
mssonicbld
a9cd1a655b [submodule] Update submodule sonic-mgmt-common to the latest HEAD automatically 2023-05-24 16:32:41 +08:00
mssonicbld
c69ddd11ed
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#15196) 2023-05-24 14:09:43 +08:00
Liu Shilong
3b10334e10
[ci] Remove saiserverv2 build in official build. (#15191)
Why I did it
libsaithriftv2 build fails and nobody is maintaining saiserverv2's build.
Remove them from official build.

Work item tracking
Microsoft ADO (number only): 23764652
How I did it
How to verify it
2023-05-23 10:22:31 +00:00
mssonicbld
2d233f5c18 [submodule] Update submodule sonic-gnmi to the latest HEAD automatically 2023-05-23 16:32:46 +08:00
mssonicbld
134f3b9854 [submodule] Update submodule sonic-mgmt-common to the latest HEAD automatically 2023-05-23 16:32:41 +08:00
mssonicbld
3ffd2ac814 [submodule] Update submodule sonic-swss to the latest HEAD automatically 2023-05-23 16:32:37 +08:00
Sachin Holla
ba6aba2b92
[mgmt-framework] Fix rest-server startup script (#14979)
This script was using 'null' as default value for all optional fields
of REST_SERVER table -- due to incorrect use of 'jq -r' command.
Server was not coming up when REST_SERVER entry exists but some fields
were not given (which is a valid configuration).
Fixed the jq query expression to return empty string for non existing
fields.

Signed-off-by: Sachin Holla <sachin.holla@broadcom.com>
2023-05-22 17:42:38 -07:00
mssonicbld
f5d488dd49 [submodule] Update submodule sonic-swss-common to the latest HEAD automatically 2023-05-22 16:33:19 +08:00
Mai Bui
c5f2a0eac3
[sonic-bgpcfgd] replace yaml.load() and exit() (#14989)
#### Why I did it
It is not safe to call yaml.load with any data received from an untrusted source.
sys.exit is better than exit, considered good to use in production code.
Ref:
https://stackoverflow.com/questions/6501121/difference-between-exit-and-sys-exit-in-python
https://stackoverflow.com/questions/19747371/python-exit-commands-why-so-many-and-when-should-each-be-used
##### Work item tracking
- Microsoft ADO **(number only)**: 15022050

#### How I did it
Replace yaml.load() with yaml.safe_load()
Replace exit() by sys.exit()
#### How to verify it
pass UT
test in DUT
2023-05-21 18:23:30 -07:00
mssonicbld
bef9550b1d
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#15055) 2023-05-21 16:03:32 +08:00
Yaqiang Zhu
c69d71af1c
[minigraph] Add rack_mgmt_rack parse support in minigraph.py (#15064)
Why I did it
We need to store information of power shelf in config_db for SONiC MX switch. Current minigraph parser cannot parse rack_mgmt_map field.

Work item tracking
Microsoft ADO (number only): 22179645
How I did it
Add support for parsing rack_mgmt_map.
2023-05-20 09:25:21 -07:00
mssonicbld
b7d0f2213f
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#15150) 2023-05-20 15:16:51 +08:00
mssonicbld
8ac2696142
[submodule] Update submodule sonic-mgmt-common to the latest HEAD automatically (#15149) 2023-05-20 15:14:05 +08:00
mssonicbld
f74577f606
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15151) 2023-05-20 15:04:54 +08:00
abdosi
b5b5883c27
[minigraph.py]: Updated Static Route Minigraph Attribute property (#14951)
What I did:
Updated Static Route Attribute in Minigraph. NGS Minigraph has define semantics of static route differently.
See below for differences:-

Microsoft ADO: 17956325

Before

<AssociatedTo>8.0.0.1/32</AssociatedTo>
<Address>192.168.1.2,192.168.2.2</Address>
<AttachTo>PortChannel40,PortChannel50</AttachTo>

Now:

<Address>8.0.0.1</Address>
<AttachTo>PortChannel40,192.168.1.2;PortChannel50,192.168.2.2</AttachTo>

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-05-19 11:49:10 -07:00
vmittal-msft
ecb4db58a9
Update PG headroom settings ports based on port speed/cable length (#14908)
* Update PG headroom settings ports based on port speed/cable length

* Updated XOFF settings to use chip level numbers than core

* Updated PG headroom based on uplink/downlink side

* fix for sonic-config-gen tests

* More fixes for unit test cases

* more test fixes

* Merged multiple functions into one
2023-05-19 08:19:27 -07:00
siqbal1986
c900abbdb0
[Yang model] Add Yang models for VNET table. (#14873)
Created Yang Modle for VNET table.
https://github.com/sonic-net/sonic-buildimage/issues/14534

##### Work item tracking
- Microsoft ADO **(number only)**:
18215579
2023-05-18 14:53:26 -07:00
Pavan-Nokia
c5d0507224
[arm64][Nokia-7215-A1]Add support for Nokia-7215-A1 platform (#13795)
Add new Nokia build target and establish an arm64 build:

    Platform: arm64-nokia_ixs7215_52xb-r0
    HwSKU: Nokia-7215-A1
    ASIC: marvell
    Port Config: 48x1G + 4x10G

How I did it

- Change make files for saiserver and syncd to use Bulleseye kernel
- Change Marvell SAI version to 1.11.0-1
- Add Prestera make files to build kernel, Flattened Device Tree blob and ramdisk for arm64 platforms
- Provide device and platform related files for new platform support (arm64-nokia_ixs7215_52xb-r0).
2023-05-18 14:24:05 -07:00
Samuel Angebault
fa95ebcaae Add optional zram compression for docker_inram
Some devices running SONiC have a small storage device (2G and 4G mainly)
The SONiC image growth over time has made it impossible to install
2 images on a single device.
Some mitigations have been implemented in the past for some devices but
there is a need to do more.

One such mitigation is `docker_inram` which creates a `tmpfs` and
extracts `dockerfs.tar.gz` in it.
This all happens in the SONiC initramfs and by ensuring the installation
process does not extract `dockerfs.tar.gz` on the flash but keep the file as is.

This mitigation does a tradeoff by using more RAM to reduce the disk footprint.
It however creates new issues for devices with 4G of system memory since
the extracted `dockerfs.tar.gz` nears the 1.6G.
Considering debian upgrades (with dual base images) and the continuous
stream of features this is only going to get bigger.

This change introduces an alternative to the `tmpfs` by allowing a system
to extract the `dockerfs.tar.gz` inside a `zram` device thus bringing
compression in play at the detriment of performance.

Introduce 2 new optional kernel parameters to be consumed by SONiC initramfs.
 - `docker_inram_size` which represent the max physical size of the
   `zram` or `tmpfs` volume (defaults to DOCKER_RAMFS_SIZE)
 - `docker_inram_algo` which is the method to use to extract the
   `dockerfs.tar.gz` (defaults to `tmpfs`)
   other values are considered to be compression algorithm for `zram`
   (e.g `zstd`, `zlo-rle`, `lz4`)

Refactored the logic to mount the docker fs in the SONiC initramfs under
the `union-mount` script.
Moved the code into a function to make it cleaner and separated the
inram volume creation and docker extraction.

On Arista platform with a flash smaller or equal to 4GB set
`docker_inram_algo` to `zstd` which produces the best compression ratio
at the detriment of a slower write performance and a similar read
performance to other `zram` compression algorithms.
2023-05-18 14:21:52 -07:00
Samuel Angebault
467994c024 [Arista] Fix boot0 code for docker_inram
Enable docker_inram for all systems with 4GB or less of flash.
This is mandatory to allow these systems to store 2 SONiC images.

This change also fixes the missing docker_inram attribute when
installing a new image from SONiC.
Because the SWI image can ship with additional kernel parameters within
such as `sonic_fips=` this lead to a conflict.
To prevent the conflict, the extra kernel parameters from the SWI are
now stored in the file `kernel-cmdline-append` which isn't used anywhere.
2023-05-18 14:21:52 -07:00
FuzailBrcm
37eddd479d
[pddf]: Adding S3IP supported attribute for FAN in PDDF (#15075)
The S3IP (Simplified Switch System INtegration Program) sysfs specification defines a unified interface to access peripheral hardware on devices from different vendors, making it easier for SONiC to support different devices and platforms.

PDDF is a framework to simplify the driver and SONiC platform APIs development for new platforms. This effort is first step in combining the two frameworks.

This specific PR adds S3IP supported sysfs attribute in common FAN driver of PDDF.
2023-05-18 14:06:46 -07:00
FuzailBrcm
d6768b3259
[pddf]: Adding S3IP supported attribute for LEDs in PDDF (#15074)
The S3IP (Simplified Switch System INtegration Program) sysfs specification defines a unified interface to access peripheral hardware on devices from different vendors, making it easier for SONiC to support different devices and platforms.

PDDF is a framework to simplify the driver and SONiC platform APIs development for new platforms. This effort is first step in combining the two frameworks.

This specific PR adds the S3IP supported sysfs attributes in PDDF common LED driver.
2023-05-18 14:06:19 -07:00
FuzailBrcm
771a1170d8
[pddf]: Adding and enabling S3IP support in PDDF (#15073)
Why I did it
The S3IP (Simplified Switch System INtegration Program) sysfs specification defines a unified interface to access peripheral hardware on devices from different vendors, making it easier for SONiC to support different devices and platforms.

PDDF is a framework to simplify the driver and SONiC platform APIs development for new platforms. This effort is first step in combining the two frameworks.

This specific PR adds support for pddf-s3ip-init.service and enables it in PDDF.
2023-05-18 13:13:16 -07:00
xumia
819ab5db50
Change the docker image from alpine to debian in Makefile (#15132)
Why I did it
For security and consistency consideration, change the docker image from alpine to Debian in Makefile

Work item tracking
Microsoft ADO (number only): 23077660
How I did it
change the docker image from alpine to Debian in Makefile
2023-05-18 11:37:49 -07:00
abdosi
b7d04b6bd5
[minigraph]: Enhancement to minigraph parsing for chassis/multi-asic use case (#14243)
Following changes are done:    

Added Support where if asic configuration is not present in minigraph sonic-cfggen do not error out but instead process it gracefully. 

Use Case: In Supervisor we have number of asic are define as max possible but in minigraph configuration of only valid/available asics only are present. Without this change load_minigraph fails.

Microsoft ADO: 17956325

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-05-18 10:57:16 -07:00
lixiaoyuner
6dffa55e9c
Clean up the old version container images (#14978)
Why I did it
Our k8s feature will pull new version container images for each upgrade, the container images inside sonic will be more and more, but for now we don’t have a way to clean up the old version container images, the disk may be filled up. Need to add cleaning up the old version container images logic.

Work item tracking
Microsoft ADO (number only):
17979809
How I did it
Remove the old version container images besides the feature's current version and last version image, last version image is saved for supporting fallback.

How to verify it
Check whether the old version images are removed
2023-05-18 10:37:34 -07:00