Why I did it
After PFC interop testing between 8102 and 7050cx3, data packet losses were observed on the Rx ports of the 7050cx3 (inflow from 8102) during testing. This was primarily due to the slower response times to react to PFC pause packets for the 8102, when receiving such frames from neighboring devices. To solve for the packet drops, the 7050cx3 pg headroom size has to be increased to 160kB.
How I did it
Modified the xoff threshold value to 160kB in the pg_profile file to allow for the buffer manager to read that value when building the image, and configuring the device
How to verify it
run "mmuconfig -l" once image is built
Signed-off-by: dojha <devojha@microsoft.com>
Why I did it:
API get_device_runtime_metadata() added by #11795 uses merge operator for dict but that is supported only for python version >=3.9. This API will be be used by scrips eg:hostcfgd which is still build for buster which does not have python 3.9 support.
As part of PR #11754
Change was added to use variable SONIC_DB_NS_CLI for
namespace but that will not work since ./files/scripts/syncd_common.sh
uses SONIC_DB_CLI. So revert back to use SONIC_DB_CLI and define new
variable for SONIC_GLOBAL_DB_CLI for global/host db cli access
Also fixed DB_CLI not working for namespace.
Why I did it
Currently the CLI commands show interface status show interface counters and show interface description displays Ethernet-IB and Ethernet-Rec ports in the output. These are internal ports should only be displayed when the option -d all is used for the above mentioned CLI commands
How I did it
Add the port roles Inb and Rec when classifing a port as internal port.
How to verify it
Verify the CLI output of the command show interface status doesnt display the Ethenet-IB and Ethernet-Rec port when -d all option in not present
Before
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
Why I did it
Fix a build not stable issue: #11620
The vs vm has started successfully, but failed to wait for the message "sonic login:".
There were 55 builds failed caused by the issue in the last 30 days.
AzurePipelineBuildLogs
| where startTime > ago(30d)
| where type =~ "task"
| where result =~ "failed"
| where name =~ "Build sonic image"
| where content contains "Timeout exceeded"
| where content contains "re.compile('sonic login:')"
| project-away content
| extend branchName=case(reason=~"pullRequest", tostring(todynamic(parameters)['system.pullRequest.targetBranch']),
replace("refs/heads/", "", sourceBranch))
| summarize FailedCount=dcount(buildId) by branchName
branchName FailedCount
master 37
202012 9
202106 4
202111 2
202205 1
201911 1
It is caused by the login message mixed with the output message of the /etc/rc.local, one of the examples as below: (see the message rc.local[307]: sonic+ onie_disco_subnet=255.255.255.0 login: )
The check_install.py was waiting for the message "sonic login:", and Linux console was waiting for the username input (the login message has already printed in the console).
https://dev.azure.com/mssonic/build/_build/results?buildId=123294&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=359769c4-8b5e-5976-a793-85da132e0a6f
2022-07-17T15:00:58.9198877Z [ 25.493855] rc.local[307]: + onie_disco_opt53=05
2022-07-17T15:00:58.9199330Z [ 25.595054] rc.local[307]: + onie_disco_router=10.0.2.2
2022-07-17T15:00:58.9199781Z [ 25.699409] rc.local[307]: + onie_disco_serverid=10.0.2.2
2022-07-17T15:00:58.9200252Z [ 25.789891] rc.local[307]: + onie_disco_siaddr=10.0.2.2
2022-07-17T15:00:58.9200622Z [ 25.880920]
2022-07-17T15:00:58.9200745Z
2022-07-17T15:00:58.9201019Z Debian GNU/Linux 10 sonic ttyS0
2022-07-17T15:00:58.9201201Z
2022-07-17T15:00:58.9201542Z rc.local[307]: sonic+ onie_disco_subnet=255.255.255.0 login:
2022-07-17T15:00:58.9202309Z [ 26.079767] rc.local[307]: + onie_exec_url=file://dev/vdb/onie-installer.bin
How I did it
Input a newline when finished to run the script /etc/rc.local.
If entering a newline, the message "sonic login:" will prompt again.
Why I did it
Content of platform.json was outdated and some platform_tests/api of sonic-mgmt were failing.
How I did it
Added the necessary values to platform.json
How to verify it
Running platform_tests/api of sonic-mgmt should yield 100% passrate.
Port index 22 is associated with phy23_config.json, then same port index 22 in phy24_config.json may cause gearbox port creation error. Port Ethernet22 maps to index 23.
update sai module with commit
- 566d4a8ef2 2022-08-11 | [SAI-PTF] Enable saiserverv2 with syncd-rpc and fix saithriftv2 build (#1552) (#1533) (#1514) (#1492) (#1558) (#1557) [Richard.Yu]
- a1796a53cc 2022-08-11 | Add support of mdio IPC server class using sai switch api and unix socket
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
Added initial set of config files to allow for booting and partial traffic testing in SONiC on the 720DT-48S.
How to verify it
- Switch boots
- show interfaces status shows links up on interfaces Ethernet24-51
- Traffic flows with no errors on interfaces Ethernet24-51
* [SAIServer] support saiserver v2 in bullseye
Support build saiserverv2 in bullseye
Add dependencies for building saiserverv2
Upgrade libboost-atomic1.71 to libboost-atomic1.74
Test done:
Local builded with NOSTRETCH=y NOJESSIE=y NOBUSTER=y SAITHRIFT_V2=y make target/docker-saiserverv2-brcm.gz
* update libboost-atomic from 1.71 to 1.74 for bullseye
#### Why I did it
Fix the build failure caused by the installer image size too small. The installer image is only used during the build, not impact the final images.
See https://dev.azure.com/mssonic/build/_build/results?buildId=139926&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=359769c4-8b5e-5976-a793-85da132e0a6f
```
+ fallocate -l 2048M ./sonic-installer.img
+ mkfs.vfat ./sonic-installer.img
mkfs.fat 4.2 (2021-01-31)
++ mktemp -d
+ tmpdir=/tmp/tmp.TqdDSc00Cn
+ mount -o loop ./sonic-installer.img /tmp/tmp.TqdDSc00Cn
+ cp target/sonic-vs.bin /tmp/tmp.TqdDSc00Cn/onie-installer.bin
cp: error writing '/tmp/tmp.TqdDSc00Cn/onie-installer.bin': No space left on device
[ FAIL LOG END ] [ target/sonic-vs.img.gz ]
```
#### How I did it
Increase the size from 2048M to 4096M.
Why not increase to 16G like qcow2 image?
The qcow2 supports the sparse disk, although a big disk size allocated, but it will not consume the real disk size. The falocate does not support the sparse disk. We do not want to allocate a very big disk, but no use at all. It will require more space to build.
* [sonic_py_common] Cache Static Information in device_info to speed up CLI response (#11696)
- Why I did it
Profiled the execution for the following cmd intfutil -c status
- How I did it
Cached the following information:
1. get_sonic_version_info()
2. get_platform_info()
None of the API exposed to the user libraries (for eg: sonic-utilities) has been modified
These methods involve reading text files or from redis. Thus, caching helped to improve the execution time
- How to verify it
Added UT's.
Verified on the device
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
* Removed UT since libswsscommom dep is missing in <=202205
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Why I did it
The directory /var/warmboot as top directory for warmboot feature is also needed in docker gbsyncd. Some vendor SAI might save data under it. Without it, the SAI init/creation API failure has happened on PikeZ platform.
How I did it
Mount host directory /host/warmboot as /var/warmboot in docker gbsyncd, which is same as what it has done on docker syncd.
With the Broadcom syncd containers getting upgraded to Bullseye, the DNX
RPC container is no longer automatically built. Explicitly add a make
command to build it.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
VoQ chassis supervisor will have Fabric asics and the sub_role for fabric asics will be "Fabric".
The fabric asics namespaces are not being returned in get_all_namespaces() and is required in caclmgrd to add right cacl to allow internal docker traffic from fabric asic namespaces.
test_cacl_application fails on VoQ chassis Supervisor with the error:
Failed: Missing expected iptables rules: set(['-A INPUT -s 240.127.1.1/32 -d 240.127.1.1/32 -j ACCEPT', '-A INPUT -s 240.127.1.3/32 -d 240.127.1.1/32 -j ACCEPT', '-A INPUT -s 240.127.1.2/32 -d 240.127.1.1/32 -j ACCEPT'])
How I did it
Update get_all_namespaces to return fabric namespaces list.
How to verify it
Verified on VoQ chassis.
Why I did it
Migrate FRR to bullseye
How I did it
Makefile and docker config changes to refer to bullseye instead of buster.
How to verify it
Build bullseye frr docker.
Co-authored-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
Why I did it
It solves a swss orchagent crash issue on PikeZ device, due to link-training setting of external PHY port.
How I did it
Catch up the fix for CS00012257483 in version 7.1.7.2.
Change `sxdkernel start` to `sxdkernel restart`. If `syncd` service crashes in `ExecStartPre` systemd will not call `ExecStop` and thus will not call `sxdkernel stop`. Use of `sxdkernel restart` is more robust in terms of guarantees to restore the system after unexpected crashes.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
- Why I did it
This new breakout mode is required when a QSFP cable is used on the QSFP-DD supported 4700 port. since QSFP only uses the first 4 lanes, this mode is required to restrict the child ports to only use the first four lanes
- How I did it
Updated the platfrom.json file with the extended data
- How to verify it
Tested on one port:
root@msn-4700:/home/admin# show int status
Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC
----------- ------------------------------- ------- ----- ----- ------- ------ ------ ------- ----------------------------------------------- ----------
Ethernet0 0 25G 9100 N/A etp1a routed up up QSFP28 or later N/A
Ethernet1 1 25G 9100 N/A etp1b routed down up N/A N/A
Ethernet2 2 25G 9100 N/A etp1c routed down up N/A N/A
Ethernet3 3 25G 9100 N/A etp1d routed down up N/A N/A
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Why I did it
On a supervisor card in a chassis, syncd/teamd/swss/lldp etc dockers are created for each Switch Fabric card. However, not all chassis would have all the switch fabric cards present. In this case, only dockers for Switch Fabrics present would be created.
The monit 'container_checker' fails in this scenario as it is expecting dockers for all Switch Fabrics (based on NUM_ASIC defined in asic.conf file).
- Why I did it
To update MFT package to the latest version.
- How I did it
Updated MFT_VERSION & MFT_REVISION in platform/mellanox/mft.mk.
- How to verify it
Run regression testing using tests from sonic-mgmt
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Cherry pick PR ttps://github.com/sonic-net/sonic-buildimage/pull/10996 to 202205 branch
Updating sonic-utilities sub module with the following commits
ca785a2 Remove sonic-db-cli
#### Why I did it
To fix sonic-db-cli high CPU usage on SONiC startup issue: https://github.com/sonic-net/sonic-buildimage/issues/10218
sonic-db-cli re-write with c++ and move to sonic-swss-common repo.
#### How I did it
#### How to verify it
#### Which release branch to backport (provide reason below if selected)
#### Description for the changelog
ca785a2 Remove sonic-db-cli
#### A picture of a cute animal (not mandatory but encouraged)
This was done manually, to try to get past a build error due to changing
package versions in Debian.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* [snmpd]: Update to 5.9+dfsg-4+deb11u1 to match Debian version
This brings in some security fixes.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Update snmpd makefile
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Remove binNMU for snmpd
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Update the bcm config file system_ref_core_clock_khz param to
handlesystems with J2cplus linecards.
We need system_ref_core_clock_khz to be set to 1600000 for supporting j2
and j2cplus linecards on the same chassis.
Why I did it
Update SDK/FW version - 4.5.2318/2010_2318 to pick up new fixes:
Cr space timeout on Hold and Release GW - at warm boot
SPC-1 Port in stuck PHY_UP after peer side rebooted
memory leak in sx_api_router_ecmp_update_set
How I did it
update the make file with the new version number
update submodule Switch-SDK-drivers pointer
How to verify it
run sonic regression
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did:
In case of multi-asic platforms gbsyncd is not getting added to Feature Table of Host Config DB. Without this container_checker complains of not needed gbsyncd container's are running.
How I did:
Update Both Host and Namespace config db when gbsyncd docker is starting.
How I verify:
Verified on Multi-asic platforms.