Commit Graph

1225 Commits

Author SHA1 Message Date
mssonicbld
0fe5c9fc7d
[platform]: Disable interrupt for intel i2c-i801 driver (#16309) (#16457)
On S6100 we are seeing almost 100K interrupts per second on intels i801 SMBUS controller which affects systems performance.

We now disable the i801 driver interrupt and instead enable polling

Microsoft ADO (number only): 24910530

How I did it
Disable the interrupt by passing the interrupt disable feature argument to i2c-i801 driver

How to verify it
This fix is NOT applicable for ARM based platforms. Applicable only for intel based platforms:-

- On SN2700 its already disabled in Mellanox hw-mgmt
- Celestica DX010 and E1031
- Dell S6100 verified the interrupts are no longer incrementing.
- Arista 7260CX3

Signed-off-by: Prince George <prgeor@microsoft.com>
Co-authored-by: Prince George <45705344+prgeor@users.noreply.github.com>
2023-09-06 09:49:58 -07:00
mssonicbld
07955af2ed
[ci/build]: Upgrade SONiC package versions (#16316) 2023-09-05 21:54:50 -07:00
mssonicbld
d5e2c0004f
Assign the higher metric value for Ipv6 default route learnt via RA message (#16367) (#16440)
* Fix the Loopback0 IPv6 address of LC's in chassis not reachable from peer device's
* Assign the metric vaule for Ipv6 default route learnt via RA message to higher value so that BGP learnt default route is higher priority.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
2023-09-05 21:52:38 -07:00
Junchao-Mellanox
874ca68060
Fix issue: set has_timer attribute to true for platform monitor service (#15624)
There is a redundant line in init_cfg.json.j2. It would cause pmon service always has "has_timer=False". However, we know that PMON has a timer now. So, I try to fix it here.
2023-09-04 19:38:21 -07:00
mssonicbld
f7f2e654c4
[chassis] Chassis DB cleanup when asic comes up (#16213) (#16378)
* [chassis]Chassis DB cleanup when asic comes up

Cleanup the entries from the following tables in chassis app db in
redis_chassis server in the supervisor
(1) SYSTEM_NEIGH
(2) SYSTEM_INTERFACE
(3) SYSTEM_LAG_MEMBER_TABLE
(4) SYSTEM_LAG_TABLE
As part of the clean up only those entries created by the asic that
is coming up are deleted. The LAG IDs used by the asics are also
de-allocated from SYSTEM_LAG_ID_TABLE and SYSTEM_LAG_ID_SET

- Added check to run the chassis db clean up only for voq switches.

Signed-off-by: vedganes <veda.ganesan@nokia.com>
Co-authored-by: vganesan-nokia <67648637+vganesan-nokia@users.noreply.github.com>
2023-09-01 16:20:31 -07:00
mssonicbld
46e562b881
[ci/build]: Upgrade SONiC package versions (#16214) 2023-08-28 09:29:43 -07:00
Junchao-Mellanox
611449dc88
Fix issue: systemctl daemon-reload would sporadically cause udev handler fail (#15253) (#15959)
A workaround to back port the fix for a systemd issue.

The systemd issue: systemd/systemd#24668
The systemd PR to fix the issue: https://github.com/systemd/systemd/pull/24673/files

The formal solution should upgrade systemd to a version that contains the fix. But, systemd is a very basic service, upgrading systemd requires heavy test.
2023-08-22 09:54:56 -07:00
mssonicbld
f95031b5ab
[ci/build]: Upgrade SONiC package versions (#16124) 2023-08-16 13:30:16 -07:00
mssonicbld
270820c1cf
[chassis]: removed dependency for bgp and swss for chassis supervisor (#15734) (#16099)
Fixes #15667 and #13293

Work item tracking
Microsoft ADO 24472854:

How I did it
On chassis supervisor bgp feature is disabled in hostcfgd. The dependency between swss and bgp causes the bgp containers to start even though the feature is disabled.

How to verify it
Tests on chassis supervisor and LC

Co-authored-by: Arvindsrinivasan Lakshmi Narasimhan <55814491+arlakshm@users.noreply.github.com>
2023-08-11 08:39:22 -07:00
mssonicbld
f835098361
Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot (#15685) (#16098)
* Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot
* Fix db-cli usage
* Handle same image warm-reboot and generalize handling of INIT flag
* Cover boot from ONIE case: set config init flag when minigraph, config_db are missing
* Handle case: first boot of SONiC
* Check for config init flag
* Simplify logic, and do not call db_migrator for same image reboot

Co-authored-by: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com>
2023-08-11 08:38:59 -07:00
mssonicbld
a134bfe0b2
[syncd.sh] Clear semaphore before updating firmware (#15818) (#16068)
Why I did it
The hw resources should be released before updating firmware.

How I did it
Added logic to release hw resources in syncd.sh script

Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
Co-authored-by: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
2023-08-10 13:34:52 -07:00
mssonicbld
d351e05f82
[monit][dualtor] Periodically check mux neighbors consistency (#15769) (#15954)
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Co-authored-by: Longxiang Lyu <35479537+lolyu@users.noreply.github.com>
2023-08-10 13:33:09 -07:00
mssonicbld
5d250c6264
[ci/build]: Upgrade SONiC package versions (#15940) 2023-08-10 13:32:17 -07:00
mssonicbld
a03489a413
[ci/build]: Upgrade SONiC package versions (#15939) 2023-07-22 15:52:35 -07:00
mssonicbld
ab0768eb15
Update WRED profile on system ports (#15612) (#15914)
* Update WRED profile on system ports

Co-authored-by: vmittal-msft <46945843+vmittal-msft@users.noreply.github.com>
2023-07-20 08:39:54 -07:00
mssonicbld
0291dae68a
[ci/build]: Upgrade SONiC package versions (#15855) 2023-07-19 08:28:14 -07:00
mssonicbld
1b32bf6b2d
update rsyslog log size conf (#15821) (#15845) 2023-07-15 05:47:03 +08:00
mssonicbld
7c6a1612d1
[ci/build]: Upgrade SONiC package versions (#15766) 2023-07-13 08:27:25 -07:00
mssonicbld
7e5156b64c
[ci/build]: Upgrade SONiC package versions (#15760) 2023-07-08 09:45:59 -07:00
mssonicbld
2d1efaec67
Revert "Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464)" (#15684) (#15746) 2023-07-08 07:21:28 +08:00
mssonicbld
6f6db96634
[ci/build]: Upgrade SONiC package versions (#15700) 2023-07-07 14:27:19 -07:00
lixiaoyuner
6922edba80
Move k8s script to docker-config-engine (#14788) (#15740)
Why I did it
To reduce the container's dependency from host system

Work item tracking
Microsoft ADO (number only):
17713469
How I did it
Move the k8s container startup script to config engine container, other than mount it from host.

How to verify it
Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container.

Signed-off-by: Yun Li <yunli1@microsoft.com>
Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>
2023-07-07 09:22:58 -07:00
mssonicbld
7952fe7f4d
[arp_update]: Fix IPv6 neighbor race condition (#15583) (#15694) 2023-07-01 10:21:55 +08:00
mssonicbld
a4a084f812
[mlnx-ffb.sh] Update issu-version location (#14925) (#15673)
#### Why I did it

ISSU version check fails due to inability to mount squashfs from 202211 on 201911

#### How I did it

Put ISSU version file under platform directory

#### How to verify it

Warm-upgrade matrix:
- 201911 (with https://github.com/sonic-net/sonic-buildimage/pull/14928) to master
- 201911 (with https://github.com/sonic-net/sonic-buildimage/pull/14928) to 202211
- 202012 (with https://github.com/sonic-net/sonic-buildimage/pull/14927) to master
- 202205 (with this change cherry-picked) to master

Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
2023-06-30 13:53:33 -07:00
mssonicbld
1c6e87657e
[ci/build]: Upgrade SONiC package versions (#15615) 2023-06-28 09:28:41 -07:00
mssonicbld
5db1a495a1
[ci/build]: Upgrade SONiC package versions (#15525) 2023-06-21 17:25:31 -07:00
mssonicbld
d5d674e89d
Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464) (#15517) 2023-06-17 09:18:08 +08:00
siqbal1986
2d436cc59d
202205 cast for https://github.com/sonic-net/sonic-buildimage/pull/14992 (#15499)
Why I did it
CP of orignal PR #14992 which failed automatic CP.

Work item tracking
Microsoft ADO (number only):21695894
2023-06-16 08:36:18 -07:00
mssonicbld
c4fcd31fa6
enable ethernet backplane port support in port config for packet mode T2 devices (#14533) (#15479) 2023-06-16 03:52:17 +08:00
Liping Xu
40ef03e70b allow docker_inram to kernel cmd list (#15374)
Why I did it
After docker_inram is enabled, the docker folder's default max size is 1.5G.
It's not big enough for some tests which need to install additional docker images or install extra packages.

Work item tracking
Microsoft ADO 24199761:
How I did it
add docker_inram into cmdline_allowlist

How to verify it
sudo sh -c 'echo "docker_inram_size=3000M" >> kernel-cmdline-append'
sudo reboot and check the docker folder size
2023-06-15 14:33:54 +08:00
Saikrishna Arcot
9e16a7a452
Re-add 127.0.0.1/8 when bringing down the interfaces (#15080) (#15462)
* Re-add 127.0.0.1/8 when bringing down the interfaces

With #5353, 127.0.0.1/16 was added to the lo interface, and then
127.0.0.1/8 was removed. However, when bringing down the lo interface,
like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8
isn't added back to the interface. This means that there's a period of
time where 127.0.0.1 is not available at all, and services that need to
connect to 127.0.01 (such as for redis DB) will fail.

To fix this, when going down, add 127.0.0.1/8. Add this address before
the existing configuration gets removed, so that 127.0.0.1 is available
at all times.

Note that running `ifdown lo` doesn't actually bring down the loopback
interface; the interface always stays "physically" up.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-06-14 16:28:57 -07:00
mssonicbld
fbe5fe736e
[ci/build]: Upgrade SONiC package versions (#15326) 2023-06-06 15:40:37 -07:00
mssonicbld
b0abe7149a
Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933) (#15316) 2023-06-03 09:30:12 +08:00
vmittal-msft
723c508a30
Update PG headroom settings ports based on port speed/cable length (#15287)
Why I did it
Update cable length for uplink/downlink ports for chassis and and update PG/pool headroom size accordingly.

Work item tracking
17880812

How I did it
Updated cable length as well as buffer config in HWSKU files.
2023-06-02 15:48:11 -07:00
Samuel Angebault
feb8671601
[202205] Implement zram compression for docker in RAM (#15137)
* [Arista] Fix boot0 code for docker_inram

Enable docker_inram for all systems with 4GB or less of flash.
This is mandatory to allow these systems to store 2 SONiC images.

This change also fixes the missing docker_inram attribute when
installing a new image from SONiC.
Because the SWI image can ship with additional kernel parameters within
such as `sonic_fips=` this lead to a conflict.
To prevent the conflict, the extra kernel parameters from the SWI are
now stored in the file `kernel-cmdline-append` which isn't used anywhere.

* Add optional zram compression for docker_inram

Some devices running SONiC have a small storage device (2G and 4G mainly)
The SONiC image growth over time has made it impossible to install
2 images on a single device.
Some mitigations have been implemented in the past for some devices but
there is a need to do more.

One such mitigation is `docker_inram` which creates a `tmpfs` and
extracts `dockerfs.tar.gz` in it.
This all happens in the SONiC initramfs and by ensuring the installation
process does not extract `dockerfs.tar.gz` on the flash but keep the file as is.

This mitigation does a tradeoff by using more RAM to reduce the disk footprint.
It however creates new issues for devices with 4G of system memory since
the extracted `dockerfs.tar.gz` nears the 1.6G.
Considering debian upgrades (with dual base images) and the continuous
stream of features this is only going to get bigger.

This change introduces an alternative to the `tmpfs` by allowing a system
to extract the `dockerfs.tar.gz` inside a `zram` device thus bringing
compression in play at the detriment of performance.

Introduce 2 new optional kernel parameters to be consumed by SONiC initramfs.
 - `docker_inram_size` which represent the max physical size of the
   `zram` or `tmpfs` volume (defaults to DOCKER_RAMFS_SIZE)
 - `docker_inram_algo` which is the method to use to extract the
   `dockerfs.tar.gz` (defaults to `tmpfs`)
   other values are considered to be compression algorithm for `zram`
   (e.g `zstd`, `zlo-rle`, `lz4`)

Refactored the logic to mount the docker fs in the SONiC initramfs under
the `union-mount` script.
Moved the code into a function to make it cleaner and separated the
inram volume creation and docker extraction.

On Arista platform with a flash smaller or equal to 4GB set
`docker_inram_algo` to `zstd` which produces the best compression ratio
at the detriment of a slower write performance and a similar read
performance to other `zram` compression algorithms.
2023-06-02 08:36:18 -07:00
mssonicbld
6cf6c59c8c
[ci/build]: Upgrade SONiC package versions (#15245) 2023-05-28 20:42:02 +08:00
mssonicbld
d00dd1fca7
[ci/build]: Upgrade SONiC package versions (#15243) 2023-05-27 20:17:51 +08:00
mssonicbld
036b8d1315
[ci/build]: Upgrade SONiC package versions (#15192) 2023-05-23 20:43:44 +08:00
mssonicbld
11226b9ca4
[ci/build]: Upgrade SONiC package versions (#15174) 2023-05-21 20:38:55 +08:00
mssonicbld
80aba31433
[arp_update] Resolve neighbors from config_db (#15006) (#15124) 2023-05-18 08:50:55 +08:00
judyjoseph
b6df524b0f
Add override_config to load_minigraph in config-setup service (#14834) (#15097)
This PR is to handle the override minigraph config by golden_config_db.json file if it is present in the backup location.
2023-05-17 13:17:20 -07:00
Tejaswini Chadaga
a3a041a3cd
Revert "Add load_minigraph option to include traffic-shift-away during config migration (#11403)" (#14881)
This reverts commit 0c7f0aa9b7.
2023-05-03 17:10:15 -07:00
mssonicbld
0ed0df6ddb
[ci/build]: Upgrade SONiC package versions (#14913) 2023-05-02 20:20:59 +08:00
Ying Xie
9ac9908321
Revert "Clear DNS configuration received from DHCP during networking reconfiguration in Linux. (#13516) (#13695)" (#14900)
This reverts commit d1fa414f1b.
2023-05-01 16:48:56 -07:00
mssonicbld
5d9658f503
[ci/build]: Upgrade SONiC package versions (#14839) 2023-04-29 20:49:34 +08:00
mssonicbld
36b6d5824c
[ci/build]: Upgrade SONiC package versions (#14812) 2023-04-23 20:52:29 +08:00
mssonicbld
7cc8c76f0f
Increase wait_for_tunnel() timeout to 90s (#14279) (#14733) 2023-04-20 05:47:12 +08:00
mssonicbld
20bb5daa6a [ci/build]: Upgrade SONiC package versions 2023-04-18 22:39:02 +08:00
mssonicbld
94ba969676
[write standby] force DB connections to use unix socket to connect (#14524) (#14553) 2023-04-18 17:11:59 +08:00
anamehra
0b30826e56 chassis-packet: resolve the missing static routes (#14593)
Why I did it
Fixes #14179
chassis-packet: missing arp entries for static routes causing high orchagent cpu usage

It is observed that some sonic-mgmt test case calls sonic-clear arp, which clears the static arp entries as well. Orchagent or arp_update process does not try to resolve the missing arp entries after clear.

How I did it
arp_update should resolve the missing arp/ndp static route
entries. Added code to check for missing entries and try ping if any
found to resolve it.

How to verify it
After boot or config reload, check ipv4 and ipv4 neigh entries to make sure all static route entries are present
manual validation:
Use sonic-clear arp and sonic-clear ndp to clear all neighbor entries
run arp_update
Check for neigh entries. All entries should be present.
Testing on T0 setup route/for test_static_route.py

The test set the STATIC_ROUTE entry in conifg db without ifname:
sonic-db-cli CONFIG_DB hmset 'STATIC_ROUTE|2.2.2.0/24' nexthop 192.168.0.18,192.168.0.25,192.168.0.23

"STATIC_ROUTE": {
    "2.2.2.0/24": {
        "nexthop": "192.168.0.18,192.168.0.25,192.168.0.23"
    }
},
Validate that the arp_update gets the proper ARP_UPDATE_VARDS using arp_update_vars.j2 template from config db and does not crash:

{ "switch_type": "", "interface": "", "pc_interface" : "PortChannel101 PortChannel102 PortChannel103 PortChannel104 ", "vlan_sub_interface": "", "vlan" : "Vlan1000", "static_route_nexthops": "192.168.0.18 192.168.0.25 192.168.0.23 ", "static_route_ifnames": "" }

validate route/test_static_route.py testcase pass.
2023-04-18 14:34:49 +08:00