202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogather with this change.
Why I did it
Some products might experience an occasional IO failure in the communication between CPU and SSD.
Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue).
This issue currently affect 4 products:
DCS-7170-32C*
DCS-7170-64C
DCS-7060DX4-32
DCS-7260CX3-64
DCS-7050CX3-32S
How I did it
This change disable NCQ on the affected drive for a small set of products.
How to verify it
When the fix is applied, these 2 patterns can be found in the dmesg.
ata[0-9]+.00: FORCE: horkage modified (noncq)
NCQ (not used)
Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
with NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA)
READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec
WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec
without NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used))
READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec
WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec
Which release branch to backport (provide reason below if selected)
Change to use the snapshot mirror http://packages.trafficmanager.net/snapshot.
Warning: The Jessie distribution is EOL, please avoid to use it if you can. And the snapshot mirror will be removed in near future as well.
Why I did it
docker.com's gpg key start to work from 2023-02-23. While debian.org's gpg key expired in 2022-11.
We used a walkaround for security checking for debian gpg keys. Now we need to exclude docker.com's gpg key.
How I did it
Update docker.com's gpg key without faketime.
Update others' gpg key with faketime '2022-11'
How to verify it
#### Why I did it
To allow SSH connections from IPv6 addresses
Resolves https://github.com/Azure/sonic-buildimage/issues/7668
#### How I did it
In build_debian.sh, modify sshd_config file so as to enable listening for IPv6 connections
- Why I did it
Added BIOS upgrade infra
- How I did it
Added new make target
- How to verify it
Copy msn3800_bios.tar.gz to platform/mellanox/bios
make configure PLATFORM=mellanox
make target/files/stretch/msn3800_bios.tar.gz
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
#### Why I did it
The GPG key used for Jessie's official repos has since expired, which means building 201911 images no longer works.
#### How I did it
Fake the time to be before the expiry date.
* Fix to improve hostname handling
If config_db.json is missing hostname entry, hostname-config.sh ends
up deleting existing entry too and hostname changes to default 'localhost'
* default hostname to 'sonic` if missing in config file
Upgrade systemd to fix timer elapsed issue.
#### Why I did it
On 201911 release, snmp.timer become elapsed status and snmp.service will not be trigger by snmp.timer:
● snmp.service - SNMP container
Loaded: loaded (/usr/lib/systemd/system/snmp.service; static; vendor preset: enabled)
Active: inactive (dead)
● snmp.timer - Delays snmp container until SONiC has started
Loaded: loaded (/usr/lib/systemd/system/snmp.timer; enabled; vendor preset: enabled)
Active: active (elapsed) since Wed 2022-08-03 18:12:59 UTC; 2 months 17 days ago
This issue caused by systemd bug: https://github.com/systemd/systemd/pull/10778/files
This issue can be reproduce with following steps:
1. reboot system.
2. continusly run following commands till timer elapsed:
systemctl status snmp.timer
sudo systemctl daemon-reload
#### How I did it
Install latest version systemd from offical backport source.
#### How to verify it
Pass all test case.
Manually check reproduce steps, verify the issue fixed.
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
#### Description for the changelog
Upgrade systemd to fix timer elapsed issue.
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->
#### A picture of a cute animal (not mandatory but encouraged)
Certain platform specific packages sonic-platform-xyz, installs files onto rootfs, which would be placed on read-write mount path on /host/image-name/rw/...
when ntpd starts it tries to do read access on /usr/bin /usr/sbin/ /usr/local/bin , which inturn links further to the read-write mount path also.
Where ntpd would get below Apparmor Warning message
LOG:-
audit: type=1400 audit(1606226503.240:21): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/image-HEAD-dirty-20201111.173951/rw/usr/local/bin/" pid=3733 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
audit: type=1400 audit(1606226503.240:22): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/image-HEAD-dirty-20201111.173951/rw/usr/sbin/" pid=3733 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
audit: type=1400 audit(1606226503.240:23): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/image-HEAD-dirty-20201111.173951/rw/usr/bin/" pid=3733 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Fix:
Add rw/.. mount path similar to root path access provided for ntpd in /etc/apparmor.d/usr.sbin.ntpd
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
Why I did it
Change the path of sonic submodules that point to "Azure" to point to "sonic-net"
How I did it
Replace "Azure" with "sonic-net" on all relevant paths of sonic submodules
Porting https://github.com/sonic-net/sonic-buildimage/pull/3723 to 201911
#### Why I did it
Extend Mellanox FW utils with CPLD update feature
Added support for CPLD upgrade to Mellanox FW utility
#### How I did it
Updated Mellanox FW utility
#### How to verify it
mlnx-fw-upgrade.sh --upgrade --cpld # Regular CPLD update flow
UPDATE_MLNX_CPLD_FW=1 mlnx-fw-upgrade.sh --upgrade # Force CPLD refresh only
#### Ensure to add label/tag for the feature raised. example - [PR#2174](https://github.com/sonic-net/sonic-utilities/pull/2174) where, Generic Config and Update feature has been labelled as GCU.
Cherry-pick the commit from master where in multi-asic platforms bgp template rendering fails which needs Loopback4096 IP Address. Issue happens because of timing/race condition where if peer gets added first and then Loopback4096 notification comes to bgpcfgd
- Why I did it
Collecting MST dump before syncd restart on shutdown notification during a SAI failure
Dump can be found under:
root@sonic:/home/admin# ls -l /var/dump/mstdump/
total 10684
-rw-r--r-- 1 root root 5460332 Aug 15 18:41 mstdump_20220815_184143.tar.gz
-rw-r--r-- 1 root root 5473253 Aug 15 21:46 mstdump_20220815_214642.tar.gz
root@sonic:/home/admin# tar -xvzf /var/dump/mstdump/mstdump_20220815_214642.tar.gz
├── ir-gdb
│ └── core
└── mstdump
├── mstdump1
├── mstdump2
├── mstdump3
└── mststatus
- How I did it
Checked for shutdown notification log in sairedis and used it to determine whether the shutdown is normal or due to SAI failure
- How to verify it
Simulated a SAI failure event and verified it. Verified it also on different reboots and config reload scenarios the dump is not generated
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Why I did it
Add the hardware reboot cause when the previous software reboot failed
How I did it
Check both hardware reboot cause and software reboot cause.
Add the hardware reboot as actual reboot cause
if any hardware reboot cause is available for any software reboot.
How to verify it
Perform reboots and verify the reboot-cause
Why I did it
Add Celestica Silverstone-x platform
How I did it
Add Celestica Silverstone-x platform
How to verify it
verified by SONiC tested platform APIs
verified by SONiC APIs including " psuutil
psushow(show platform psustatus)
sfputil
sfpshow
tempershow(show platform temperature)
fanshow(show platform fan)
watchdogutil
fwutil(show platform firmware status)
decode-syseeprom -d(show platform syseeprom)
show platform ssdhealth
show platform summary
show interfaces status
"
What/Why I did:
Update Broadcom SAI debian package. New Package has following changes:
CaseCS00012248135: Fix shows error message "linux-bcm-knet: Fatal error: Incomplete chain" followed by malformed LACP/LLDP packets
Why I did it
62b7b56 2022-07-13 | Remove disabled and not loaded services before calling reset-failed and restart services (#2266) [Zain Budhwani]
09b4678 2022-07-05 | [config/load_mgmt_config] Support load IPv6 mgmt IP (#2206) (#2246) [Jing Kan]
How I did it
Pulled latest commit from 201911 sonic-utilities branch and created PR
How to verify it
Look at build-image
```
23fc702 [201911][patch] mlxsw: i2c: Prevent transaction execution for special chip st (#278)
e4f44e4 [201911] Increase log buf len size to 1M (#265)
ef6abe3 [201911] Apply kernel patches to fix emmc unreliability (#264)
7458347 [201911] Increase log buf len size to 1M
4edf1b4 [201911] Apply kernel patches to fix emmc unreliability
```
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
I updated sonic-linux-kernel to pick a fix for a bug happening during ISSU that caused CPU stall.
How I did it
Updated submodule
How to verify it
Build and run warm reboot
Why I did it
Improve throughput and latency for 7260 deployments
How I did it
Update the dynamic threshold to 0 and ECN settings as 2mb/10mb/5%
How to verify it
Added unit tests for rendering the qos template for 7260. Built sonic config engine wheel successfully
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Added Support for Celestica Midstone-100x platform
How I did it
Implemented the support for Celestica Midstone-100x platform
Platform: x86_64-cel_midstone-100x-r0
HwSKU: Midstone-100x
ASIC: innovium
ASIC Count: 1
How to verify it
Run platform test on testbed
Why I did it
There is a need to select different mmu profiles based on deployment type
How I did it
There will be separate subfolders (RDMA-CENTRIC, TCP-CENTRIC, BALANCED) in each hwsku folder which contains deployment specific mmu and qos settings. SonicQosProfile attribute in the minigraph will be used to determine which settings to use. If that attribute is not present, the default settings that exist in the hwsku folder will be used
Signed-off-by: Neetha John <nejo@microsoft.com>