Why I did it
enable sai-ptf logger in sai_adapter to log all the sai api invcations
How I did it
add build parameter to enable the sai-ptf logger when build sai PRC
How to verify it
local build test
test the generated sai_adapter
test with pipeline
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
Previously, we set timeout in each step such as Lock testbed, Prepare testbed, Run test and KVM dump. When some issue suck like retry happens in one step, it will cause timeout error, but actually, it only needs more time to success. In this pr, we remove the timeout limit in each step and control the timeout outside in each job. When the job runs more than four hours, it will be cancelled.
Why I did it
Previously, we set timeout in each step such as Lock testbed, Prepare testbed, Run test and KVM dump. When some issue suck like retry happens in one step, it will cause timeout error, but actually, it only needs more time to success. In this pr, we remove the timeout limit in each step and control the timeout outside in each job. When the job runs more than four hours, it will be cancelled.
How I did it
Remove the timeout parameter in each step, and control the timeout outside in each job.
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
In order to make the sai update easier, change the URL pattern to a more unified format, which can be update automated latter.
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
# Why I did it
Publish docker saiserverv2 in the build pipeline.
# How I did it
Add docker saiserverv2 target in the build template.
# How to verify it
Run test in #12836 and the target has been built out successfully.
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Previously, we hard code the min and max numbers of instance in a plan. In this pr, we support passing the instance numbers of a testplan.
Why I did it
Previously, we hard code the min and max numbers of instance in a plan. In this pr, we support passing the instance numbers of a testplan.
How I did it
Use a variable to set the instance number.
Signed-off-by: Neetha John <nejo@microsoft.com>
This is to backport #12626
Why I did it
There is a need to have separate profiles on compute and storage and this infra update will help achieve that
How I did it
Moved buffer pool/profile and qos definitions on TD2 to a common folder and all TD2 hwsku's will reference that folder
bugfix[2024] vnet route check exit code fix.
Show tech save file to skip non-file entries in cisco-8000 platform
[armhf][sonic-installer] [cherry-pick to branch 202012] Fix the issue of sonic-installer list after set-default and cleanup
[show][muxcable] Catch port Value error exception
[dualtor][202012] Send arp/ndp packets from vlan_mac if the device is dualtor
[show] vnet advertised-route command
[show][muxcable] add support for show mux firmware version all
Signed-off-by: Kevin(Shengkai) Wang <shengkaiwang@microsoft.com>
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
Arista 7060 platform has a rare and unreproduceable PCIe timeout that could possibly be solved with increasing the switch PCIe timeout value. To do this we'll call a script for this platform to increase the PCIe timeout on boot-up.
No issues would be expected from the setpci command. From the PCIe spec:
"Software is permitted to change the value in this field at any
time. For Requests already pending when the Completion
Timeout Value is changed, hardware is permitted to use either
the new or the old value for the outstanding Requests, and is
permitted to base the start time for each Request either on when
this value was changed or on when each request was issued. "
How I did it
Add "platform-init" support in swss docker similar to how "hwsku-init" is called, only this would be for any device belonging to a platform. Then the script would reside in device data folder.
Additionally, add pciutils dependency to docker-orchagent so it can run the setpci commands.
How to verify it
On bootup of an Arista 7060, can execute:
lspci -vv -s 01:00.0 | grep -i "devctl2"
In order to check that the timeout has changed.
#### Why I did it
To fix https://github.com/Azure/sonic-buildimage/issues/9643
#### How I did it
Instead of ast.literal_eval added python2 compat code for json strings unicode -> str convertion.
We need python2 compatibility since py2 sonic config engine (buster/sonic_config_engine-1.0-py2-none-any.whl target) is still included into the build (ENABLE_PY2_MODULES flag is set for buster). Once we abandon buster and python2, this compat and ast.literal_eval could be cleaned up all through the code base.
#### How to verify it
run steps from the linked issue
Why I did it
The current lazy installer relies on a filename sort for both unpack and configuration steps. When systemd services are configured [started] by multiple packages the order is by filename not by the declared package dependencies. This can cause the start order of services to differ between first-boot and subsequent boots. Declared systemd service dependencies further exacerbate the issue (e.g. blocking the first-boot script).
The current installer leaves packages un-configured if the package dependency order does not match the filename order.
This also fixes a trivial bug in [Build]: Support to use symbol links for lazy installation targets to reduce the image size #10923 where externally downloaded dependencies are duplicated across lazy package device directories.
How I did it
Changed the staging and first-boot scripts to use apt-get:
dpkg -i /host/image-$SONIC_VERSION/platform/$platform/*.deb
becomes
apt-get -y install /host/image-$SONIC_VERSION/platform/$platform/*.deb
when dependencies are detected during image staging.
How to verify it
Apt-get critical rules
Add a Depends= to the control information of a package. Grep the syslog for rc.local between images and observe the configuration order of packages change.
Why I did it
1.57.x SDK based incremental drop that addresses a few egress ACL and drop counter failures. Hostname, vtysh, and incorrect queue watermark issue are addressed too.
How I did it
Update cisco-8000 submodule to v0.2.3
How to verify it
Which release branch to backport (provide reason below if selected)
* Fix kube mode to local mode long duration issue
* Remove IPV6 parameters which is not necessary
* Fix read node labels bug
* Tag the running image to latest if it's stable
* Disable image_version_higher check
* Change image_version_higher checker test case
Signed-off-by: Yun Li <yunli1@microsoft.com>
#### Why I did it
Include below commits
```
00b4dc0 2022-11-14 | Remove error logging on "failed in fdb_vlanmac" (#272) [Qi Luo]
792afe8 2022-11-14 | Don't cache the vlan-id if it is not valid from DB (#273) [zhenggen-xu]
```
Add dualtor test jobs using TestbedV2 in 202012 branch.
Why I did it
Add dualtor test jobs using TestbedV2 in 202012 branch.
How I did it
Add dualtor test jobs using TestbedV2 in 202012 branch.
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Why I did it
nameserver and domain entries from build system fsroot gets into sonic image.
How I did it
Clear /etc/resolv.conf before building image
How to verify it
Built image with it and verified with install that /etc/resolv.conf is empty
git log --oneline aacb772..202012
these commits are added in sonic-platform-daemons
8ec96c0 (HEAD -> 202012, origin/202012) [ycabled] fix exception-handling logic for ycabled (#312)
3d5470d [ycabled] move swsscommon API's from subroutines to call them exactly once per task_worker/thread (#310)
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
- Why I did it
ethtool is not able to read certain pages(eg. page 11h) of CMIS cables.
SDK provides a set of sysfs to expose the transceiver EEPROM, now we migrate from using ethtool to read these sysfs for transceiver EEPROM reading.
- How I did it
replace ethtool with accessing the SDK sysfs for cable EEPROM reading.
Adjust the offset according to the SDK sysfs memory map.
- How to verify it
run sonic-mgmt sfp-related regression test case.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Update SDK/FW version - 4.5.3186/2010_3186 in order to have the following changes:
New functionality:
1. Added support for 6.5W (Class 8) in ports 49-50, 53-54, 57-58, and 61-62 on SN4600 system
Fix the following issues:
1. On very rare occasion (~1/100K), during I2C transaction with MMS1V50-WM and MMS1V90-WR modules on SN4700 system, the module may send unexpected stop which violate the I2C specification, possibly affecting the link up flow
2. When running 1GbE speeds on SN4600 system, the port remained active while peer side was closed
3. While toggling the cable with ‘sfputil lpmode on/off’, error msg like “ERR pmon#xcvrd: Receive PMPE error event on module 1: status {X} error type {y}” could be received
4. When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted
5. When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU
6. While moving from lossless to lossy mode while shared headroom was used, reduction of the shared headroom can only be done prior to pool type change and when shared headroom is not utilized
7. SLL configuration is missing in SDK dump
8. If TTL_CMD_COPY is used in Encap direction for a packet with no TTL, then the value passed in the ttl data structure will be used if non-zero (default 255 if zero)
9. PCI calibration changes from a static to a dynamic mechanism
10. Layer 4 port information is not initialized for BFD packet event. To address the issue, remote peer UDP port information was added in BFD packet event
11. SDK returned error when FEC mode is set on twisted pair, when FEC was set to None
- How I did it
Update pointer for the SDK/FW
- How to verify it
Run regression tests
Signed-off-by: dprital <drorp@nvidia.com>