#### Why I did it
Remove dialout as critical process as it is no longer used in prod. As part of future work, can remove dialout completely
#### How I did it
Remove from critical process list
Why I did it
Platform cases test_tx_disable, test_tx_disable_channel, test_power_override failed in dx010.
How I did it
Add i2c access algorithm for CPLD i2c adapters.
How to verify it
Verify it with platform_tests/api/test_sfp.py::TestSfpApi test cases.
To support 64 cores on arista skus. Fixesaristanetworks/sonic#77
Remapped recycle ports to lowers core port ids and set appl_param_nof_ports_per_modid to 64.
Co-authored-by: Sambath Kumar Balasubramanian <63021927+skbarista@users.noreply.github.com>
Why I did it
It is to fix the broadcom build failure, it is caused by the build image docker-dhcp-relay:latest not found.
2022-12-14T00:09:57.5464893Z [ FAIL LOG START ] [ target/docker-dhcp-relay.gz-load ]
2022-12-14T00:09:57.5466036Z Attempting docker image lock for docker-dhcp-relay load
2022-12-14T00:09:57.5467113Z Obtained docker image lock for docker-dhcp-relay load
2022-12-14T00:09:57.5468206Z Loading docker image target/docker-dhcp-relay.gz
2022-12-14T00:09:57.5469361Z Loaded image: docker-dhcp-relay:internal.65852159-11ad82a07a
2022-12-14T00:09:57.5470686Z Tagging docker image docker-dhcp-relay:latest as docker-dhcp-relay-sonic:latest
2022-12-14T00:09:57.5471997Z Error response from daemon: No such image: docker-dhcp-relay:latest
2022-12-14T00:09:57.5473122Z [ FAIL LOG END ] [ target/docker-dhcp-relay.gz-load ]
2022-12-14T00:09:57.5539792Z make: *** [slave.mk:1180: target/docker-dhcp-relay.gz-load] Error 1
2022-12-14T00:09:57.5540958Z make: *** Waiting for unfinished jobs....
The image had been built succeeded
2022-12-13T17:01:59.9046935Z [ finished ] [ target/docker-eventd.gz ]
2022-12-13T17:02:00.4947165Z [ building ] [ target/docker-dhcp-relay.gz ]
2022-12-13T17:02:00.6688627Z /sonic/dockers/docker-dhcp-relay/cli-plugin-tests /sonic
2022-12-13T17:02:41.1123955Z /sonic
2022-12-13T17:07:04.1786069Z [ finished ] [ target/docker-dhcp-relay.gz ]
But it was tagged by another value:
Obtained docker image lock for docker-dhcp-relay save
Tagging docker image docker-dhcp-relay-sonic:latest as docker-dhcp-relay:internal.65852159-11ad82a07a
Saving docker image docker-dhcp-relay:internal.65852159-11ad82a07a
Released docker image lock for docker-dhcp-relay save
Removing docker image docker-dhcp-relay-sonic:latest
Untagged: docker-dhcp-relay-sonic:latest
target/docker-dhcp-relay.gz
File /dpkg_cache/docker-dhcp-relay.gz-2ddfa01a109ca69b7621f1a-450bae36026d9dee62646f2.tgz saved in cache
[ CACHE::SAVED ] /dpkg_cache/docker-dhcp-relay.gz-2ddfa01a109ca69b7621f1a-450bae36026d9dee62646f2.tgz
How I did it
When the feature SONIC_CONFIG_USE_NATIVE_DOCKERD_FOR_BUILD not enabled, always save as the latest tag, not use the specify version.
The version is dynamic, it is changed when a new commit checked in, but the image of docker-dhcp-relay is not necessary to change.
What I did:
Added IP Table rule to make sure we do not drop chassis internal traffic on eth1-midpplane when Control Plane ACL's are installed.
Why I did:
When Control Plane ACL's are installed there is default Catch All rule is added to drop all traffic that is not white-listed explicitly https://github.com/sonic-net/sonic-host-services/blob/master/scripts/caclmgrd#L735. In this case Internal Traffic between Supervisor and LC will get drop. To fix this added explicit rule to allow all traffic coming from eth1-midplane.
Fixes#11873.
When loading from minigraph, for port channels, don't create the members@ array in config_db in the PORTCHANNEL table. This is no longer needed or used.
In addition, when adding a port channel member from the CLI, that member doesn't get added into the members@ array, resulting in a bit of inconsistency. This gets rid of that inconsistency.
- Why I did it
Advance hw-mgmt service to V.7.0020.4100
Add missing thermal sensors that are supported by hw-mgmt package
Delay system health service before hw-mgmt has started on Mellanox platform in order to avoid reading some sensors before ready.
Depends on sonic-net/sonic-linux-kernel#305
- How I did it
1. Update hw mgmt version
2. Add missing sensors
3. Delay service
- How to verify it
Regression test.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Why I did it
On a supervisor card in a chassis, syncd/teamd/swss/lldp etc dockers are created for each Switch Fabric card. However, not all chassis would have all the switch fabric cards present. In this case, only dockers for Switch Fabrics present would be created.
system-health indicates errors in this scenario as it is expecting dockers for all Switch Fabrics (based on NUM_ASIC defined in asic.conf file).
system-health process error messages were also altered to indicate which container had the issue; multiple containers may run processes with the same name, which can result in identical system-health error messages, causing ambiguity.
How I did it
Port container_checker logic from #11442 into service_checker for system-health.
How to verify it
Bringup Supervisor card with one or more missing fabric cards. Execute 'show system-health summary'. The command should not report failure due to missing dockers for the asics on the fabric cards which are not present.
Why I did it
Code drop with incremental fixes for Cisco 8808 and Cisco 8101-32FH-O platforms
Support for ordered ECMP capability
Support for BFD Serviceability CLI
Support for VxLAN Serviceability CLI
Support for 200G Fabric port speed for Fabric Card 8808-FC0 and Line Card 88-LC0-36FH-M/88-LC0-36FH-MO
PFC support for Q200 ASIC based line cards
Support for Single-TOR and Dual-TOR on Cisco 8102-64H-O Platform
How I did it
update cisco 8000 tag
How to verify it
linkmgrd:
* d2227d8 2023-02-22 | [active-standby] Toggle to standby if link down and config auto (#173) (HEAD -> 202205) [Longxiang Lyu]
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Update sonic-swss submodule pointer to include the following:
* 3aeb4be Align watermark flow with port configuration ([#2672](https://github.com/sonic-net/sonic-swss/pull/2672))
Signed-off-by: dprital <drorp@nvidia.com>
Why I did it
Cherry pick PR(#13894)
Azure pipeline change.
Use common template to make it easy to change common steps. Fix docker hang issue.
How I did it
How to verify
Why I did it
Fix an issue that services do not start automatically on first boot and start only after hostcfgd enables them.
This is due to a bug in systemd-sonic-generator:
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
- previously "make reset" was expecting user input from the terminal
to do its job
- setting UNATTENDED to any non-zero string will allow "make reset" to
run without interactive confirmation
Signed-off-by: Mathieu Launay <m.launay@criteo.com>
Fixes#13568
Backport of #13837
Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation:
admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
/host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa
lrwxrwxrwx 1 root root 66 Feb 8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
- Why I did it
202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change.
- How I did it
Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation.
/etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image
mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade.
- How to verify it
Upgrade from 201911 to 202205
202205 to 201911 downgrade
202205 -> 202205 reboot
ONIE -> 202205 boot (First FW burn)
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
- Why I did it
sfp_event.py gets a PMPE message when a cable event is available. In PMPE message, there is no label port available. Current sfp_event.py is using sx_api_port_device_get to get 64 logical ports attributes, and find the label port from those 64 attributes. However, if there are more than 64 ports, sfp_event.py might not be able to find the label port and drop the PMPE message.
- How I did it
Don't use hardcoded 64, get logical port number instead.
- How to verify it
Manual test
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
#### Why I did it
Fix an issue that services do not start automatically on first boot and start only after hostcfgd enables them.
This is due to a bug in systemd-sonic-generator:
```
admin@arc-switch1004:~$ /usr/lib/systemd/system-generators/systemd-sonic-generator dir
Failed to open file /usr/lib/systemd/system/database.servcee
Error parsing targets for database.servcee
Error parsing database.servcee
Failed to open file /usr/lib/systemd/system/bgp.servcee
Error parsing targets for bgp.servcee
Error parsing bgp.servcee
Failed to open file /usr/lib/systemd/system/lldp.servcee
Error parsing targets for lldp.servcee
Error parsing lldp.servcee
Failed to open file /usr/lib/systemd/system/swss.servcee
Error parsing targets for swss.servcee
Error parsing swss.servcee
Failed to open file /usr/lib/systemd/system/teamd.servcee
Error parsing targets for teamd.servcee
Error parsing teamd.servcee
Failed to open file /usr/lib/systemd/system/syncd.servcee
Error parsing targets for syncd.servcee
Error parsing syncd.servcee
```
A wrong file name is generated (e.g database.**servcee**).
#### How I did it
Fixed overlapping strings being passed to strcpy/strcat that receive restirct* pointers (strings should not overlap).
#### How to verify it
Perform first boot and observe services start immidiatelly after boot.
Why I did it
Change the mirror config file
Use the files/build/versions/default/versions-mirror only when reproducible build enabled.
The config in files/build/versions is only for reproducible build, while snapshot mirror feature does not have the dependency on the reproducible build.
How I did it
Skip the mirror config in files/build/versions/default/versions-mirror if reproducible build not enabled.
How to verify it
Co-authored-by: xumia <59720581+xumia@users.noreply.github.com>
This is to reduce writes to the SSD on the device.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
Some products might experience an occasional IO failure in the communication between CPU and SSD.
Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue).
This issue currently affect 4 products:
DCS-7170-32C*
DCS-7170-64C
DCS-7060DX4-32
DCS-7260CX3-64
How I did it
This change disable NCQ on the affected drive for a small set of products.
How to verify it
When the fix is applied, these 2 patterns can be found in the dmesg.
ata1.00: FORCE: horkage modified (noncq)
NCQ (not used)
Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
with NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA)
READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec
WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec
without NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used))
READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec
WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec
Which release branch to backport (provide reason below if selected)