Why I did it
In the voq chassis the buffer_queue configuration needs to be applied on system_port instead of the sonic port.
This PR has the change to do this.
How I did it
Modify buffer_config.j2 to generate buffer_queue configuration on system_ports if the device is Voq Chassis
How to verify it
Verify the buffer_queue configuration is generated properly using sonic-cfggen
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
- Why I did it
The followup to #12920 PR.
If the feature compilation is disabled its configuration should not be included into init_cfg.json.
- How I did it
Update init_cfg.json.j2 template to include teamd and radv features configuration only if their compilation is enabled.
- How to verify it
The default behavior is preserved. To verify the changes compile the image without overriding INCLUDE_TEAMD and INCLUDE_ROUTER_ADVERTISER options. The generated /etc/sonic/init_cfg.json should remain with no changes. Install the image and verify that both teamd and radv containers are present and running. Verify that feature state returned by show feature status command is enabled.
Change the INCLUDE_TEAMD or INCLUDE_ROUTER_ADVERTISER value to "n". Compile and install the image. Verify that feature configuration is not included in generated /etc/sonic/init_cfg.json file. Verify that show feature status output doesn't include the feature.
Why I did it
Currently sonic-slave-* tag is confusing. Set correct tag on sonic-slave-* image.
Fix job name to fit the build.
How I did it
build amd image in amd64:
sonic-slave-bullseye:cfe29bff67c
sonic-slave-bullseye:latest
sonic-slave-bullseye:master
build armhf image in amd64:
sonic-slave-bullseye-march-armhf:33614806dc3
sonic-slave-bullseye-march-armhf:latest
sonic-slave-bullseye-march-armhf:master
build arm64 image in amd64:
sonic-slave-bullseye-march-arm64:f3b1b16c801
sonic-slave-bullseye-march-arm64:latest
sonic-slave-bullseye-march-arm64:master
build arm64 image in arm64:
sonic-slave-bullseye:75cb326c9a7
sonic-slave-bullseye-arm64:latest
sonic-slave-bullseye:master
build armhf image in armhf:
sonic-slave-bullseye:64d178951fc
sonic-slave-bullseye-armhf:latest
sonic-slave-bullseye:master
How to verify it
- Why I did it
Remove dependency on interfaces-config.service to speed up boot, because interfaces-config.service takes a lot of time on boot.
- How I did it
Changed service files for swss, syncd.
- How to verify it
Boot and check swss/syncd start time comparing to interfaces-config
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Avoid traceback on sonic-clear command
sonic-clear dhcp6relay_counters
Traceback (most recent call last):
File "/usr/local/bin/sonic-clear", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/clear/plugins/dhcp-relay.py", line 19, in dhcp6relay_clear_counters
counter = DHCPv6_Counter()
NameError: name 'DHCPv6_Counter' is not defined
- How I did it
Corrected the way to import using importlib
- How to verify it
Tested the sonic-clear command and verified no traceback is seen
Why I did it
tag sonic-slave-* image with branch.
Only keep sonic-slave-* latest tag when it is master branch and amd64 arch.
How I did it
How to verify it
Updating sonic-swss to latest to include following fixes -
0d91125 [bufferorch] : Support for buffer profiles for VoQ on chassis (#2465)
94429f1 Fixed a bug causing error state of same configuration is applied twice. (#2580)
f1c0a75 Update FDB state table when , MAC entries are modified as dynamic_local. (#2575)
beaac71 [voq][chassis]Add show fabric counters port/queue commands (#2522)
44d1e9c Fix test_vlan.py (#2541)
c00455a Only collect stdout of orchagent_restart_check in vstest (#2578)
def98d9 Remove TODO comments which are no longer needed (#2568)
- Why I did it
Support syslog rate limit configuration feature
- How I did it
Remove unused rsyslog.conf from containers
Modify docker startup script to generate rsyslog.conf from template files
Add metadata/init data for syslog rate limit configuration
- How to verify it
Manual test
New sonic-mgmt regression cases
Why I did it
To keep 'Request for xxx branch' label when finished auto-cherry-pick.
How I did it
Change logic in post cherry pick action.
How to verify it
Why I did it
In PR check pipelines, there are too many duplicated warnings:
fatal: No names found, cannot describe anything.
SONIC_IMAGE_VERSION will not change in one build. We don't need to calculate in every reference. We just need calculate one time, then record it.
In Makefile, '=' will calculate again and again when it is referred.
How I did it
Fix it in Makefile.
How to verify it
Check this PR's check pipeline result.
- Why I did it
In order to prevent the sonic-mgmt/tests/platform_tests/sfp/test_sfputil.py test failing on the log analyzer step.
The mentioned test is performing the sfputil reset EthernetX for every interface on the SONiC switch, this action will flap the SFP device status (INSTERTED -> REMOVED -> INSTERTED).
The SONiC XCVRD daemon will catch this SFP device status change (because it is monitoring the presence status of the cable).
To judge the cable presence status, currently, we are still leveraging to read the first bytes of the EEPROM, and the EEPROM could be not ready at some moment and the SONiC XCVRD daemon will print the error log to Syslog:
ERR pmon#xcvrd: Error! Unable to read data for 'xx' port, page 'xx' offset 128, rc = 1, err msg: Sending access register
- How I did it
Change logging severity from ERR to WARNING
- How to verify it
Run the sonic-mgmt/tests/platform_tests/sfp/test_sfputil.py
OR much faster way to run the next script on the switch:
#!/bin/bash
START=0
END=248
for (( intf=$START; intf<=$END; intf+=8))
do
sfputil reset Ethernet"${intf}"
done
sfputil show presence
- Why I did it
Following code to judge whether a process is running inside a docker could get stuck on the simx platform
subprocess.Popen(["docker", "--version"],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
universal_newlines=True)
When it gets stuck, the config-chassisdb service can not be successfully started, thus the system can not be booted up.
root@sonic:/# service config-chassisdb status
config-chassisdb.service - Config chassis_db
Loaded: loaded (/lib/systemd/system/config-chassisdb.service; enabled; vendor preset: enabled)
Active: activating (start) since Thu 2022-12-15 09:23:02 UTC; 29min ago
Main PID: 571 (config-chassisd)
Tasks: 14 (limit: 9501)
Memory: 132.4M
CGroup: /system.slice/config-chassisdb.service
├─571 /bin/bash /usr/bin/config-chassisdb
├─575 /usr/bin/python3 /usr/local/bin/sonic-cfggen -H -v DEVICE_METADATA.localhost.platform
├─602 /bin/sh -c sudo decode-syseeprom -m
├─603 sudo decode-syseeprom -m
├─607 /usr/bin/python3 /usr/local/bin/decode-syseeprom -m
├─616 /bin/sh -c docker --version 2>/dev/null
└─617 docker --version
- How I did it
Use an alternative way to implement this function and issue can be avoided:
docker_env_file = '/.dockerenv'
return os.path.exists(docker_env_file) is False
- How to verify it
run regression on real hardware and simx platform.
Why I did it
Makefile needs some dependencies from the Internet. It will fail for network related issue.
Retries will fix most of these issues.
How I did it
Add retries when running commands which maybe related with networking.
How to verify it
Why I did it
Add two platform that support s3IP framework
How I did it
Add two platforms supporting S3IP SYSFS (TCS8400, TCS9400)
How to verify it
Manual test
Adding platform support for FS s5800-48t4s and s5800-48t8s-mars8p.
Both s5800-48t4s and s5800-48t8s-mars8p have 48 * 10/100/1000 Base-T ports, 4 * 10GE SFP+ Ports on Centec TsingMa.
s5800-48t4s is different from s5800-48t8s-mars8p in that:
The phy chip used by s5800-48t4s is Marvell 88e1680;
The phy chip used by s5800-48t4s-mars8p is Centec ctc21108;
Why I did it
Fixes#12634
Observing the following error while running 'sfputil show lpmode' command.
AttributeError: 'Sfp' object has no attribute 'get_power_set'
Root Cause: get_power_set() is defined for QSFP28 and QSFP+ i.e. Sff8636 and Sff8634. However, the function is not defined in the optoe_base class.
How I did it
To use get_power_set(), we need to initialise the 'api' via get_xcvr_api() and then use it to run get_power_set().
Why I did it
make clean is broken after #12000:
bash: -c: line 1: syntax error near unexpected token `;'
bash: -c: line 1: `make -f slave.mk PLATFORM= PLATFORM_ARCH=amd64 MULTIARCH_QEMU_ENVIRON=n
...
MIRROR_URLS= MIRROR_SECURITY_URLS= Q=@ clean; ; '
make[1]: *** [Makefile.work:531: clean] Error 2
How I did it
Remove a conditional for clean command.
Signed-off-by: Konstantin Vasin <k.vasin@yadro.com>
Why I did it
It's possible to speed up some parts of a build using parallel compression/decompression.
This is especially important for build_debian.sh.
How I did it
pigz is a parallel implementation of gzip: https://zlib.net/pigz/
Some programs like docker and mkinitramfs can automatically detect and use it instead of gzip.
For tar we need to select it directly.
To enable this feature you need to set GZ_COMPRESS_PROGRAM=pigz
- Consolidating multiple read functions in a PSU driver on the basis of byte, word or block read,
- Enhancing PDDF parsing script support for CPU and PCH temperature reading,
- Adding missing methods in PDDF common APIs
Why I did it
- PSU driver changes are to optimize the code and increase the code coverage
- PDDF parser script enhancements to accommodate the CPU and PCH temp reading using hwmon device path
- Some of the new APIs were missing from the PDDF common platform classes
How I did it
Added code changes and verified them on AS7816 adn AS7726 platforms.
Why I did it
This is to fix test_forward_ip_packet_with_0xffff_chksum_tolerant test failure on 720DT-48S. IP packets with checksum set to 0xffff will be forwarded with the same checksum on this platform, instead of updating to the correct value.
How I did it
Add bcm config sai_verify_incoming_chksum=0 so that checksum is updated instead of being left unchanged when checksum is 0xffff. Note that packets with invalid checksum are still dropped with this config.
Why I did it
Ubuntu 22.04 uses cgroup2 by default, but docker.sh doesn't mount it.
As a result we get an error when trying to run docker info in chroot env:
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
How I did it
mount cgroup2 in chroot if all enabled kernel cgroup controllers are currently not in use by cgroup1
So we need to mount cgroup in chroot environment on /sys/fs/cgroup.
Because inside chroot we don't know which cgroup version is used by the host we have two possible solutions:
cgroup tree for chroot is mounted by the host (it was my 1st version of this fix)
cgroup tree is mounted inside chroot based on info from /proc/cgroups (it's current version of this fix)
My 2nd version based on this code from systemd: 5c6c587ce2/src/shared/cgroup-setup.c (L35-L74)
We parse info from /proc/cgroups
Skip header line started from #
Skip controller if it's disabled (4th column = 0)
Count number of controllers with non-zero of hierarchy_id (2nd column)
If this number is not zero then we assume some of controllers are used by host system and the host system uses hybrid or legacy cgroup tree. In this case we can't use unified cgroup tree inside chroot and mount old cgroup tree (v1).
If this number is zero then we assume host system uses unified cgroup tree and we need to mount cgroup2 inside chroot.
Signed-off-by: Konstantin Vasin <k.vasin@yadro.com>
Why I did it
On DNX (J2/J2c/J2c+) platforms, Single Path Nexthops and ECMp Nexthop resources(FECs) are shared. BRCM SAI do not have partition of this resource, and hence more single path Nexthop entries, causes ECMP programming to fail in scaled setup.
How I did it
Broadcom provided SAI changes to reserve resources for single path nexthop entries(More details in CSP: https://brcmsemiconductor-csm.wolkenservicedesk.com/wolken-support/allcases/request-details?requestId=CS00012251649).
Along with SAI changes, they provided configurable Macro/flag to reserve NON_ECMP entries.
This PR is to add that flag in various sai.profile files wherever applicable.
PS: We are reserving 3072 single path Nexthop entries on each Linecard. Calculation is as follows.
Max Slots per chassis: 8
Max No of Ports(each LC): 64
MyIP/Subnet Entries per port: 4(v4/v6)
Nbr Entries Per port: 2(v4/v6)
Total Non_ECMP Count: 8x64x(4+2) = 3072
How to verify it
Without this change, the ECMP group count will be shown as Max_count in 'crm show resources all' command, and with this change the ECMP group count will be decreased by 24(3072/128).
Port indexes of front panel ports are not contiguous in multi-asic because we didn't distiguish between
front panel and internal ports, e.g., recycle ports. Fix this by assigning index to front panel port first
and then internal ports.
Why I did it
Provide CPLD and FPGA driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Fix#12279
Why I did it
Curl can fail when we calculate md5sum of web package.
E.g. if server responsed with 503 error.
But we don't validate this and pass any output from curl directly to md5sum.
After that we save incorrect md5 hash to versions-web file.
How I did it
use option --retry 5 for transient errors (default value is 0)
use option -f for curl and set -o pipefail for shell to detect errors
stop build if curl failed
Signed-off-by: Konstantin Vasin <k.vasin@yadro.com>
Why I did it
To ensure, that after a BGP startup, dualtor T0 receives BGP updates before sending out BGP updates.
Please refer to sonic-net/SONiC#1161 for more details.
How I did it
add coalesce-time 10000 to the frr bgp startup config.
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
MarkDown linter "mdl" reports many warnings on README.md.
Let them get fixed to ease its maintenance and readability.
Signed-off-by: Guillaume Lambert <guillaume.lambert@orange.com>
- Running "sudo pip install" can be OK in a CI but generates a clear
warning. It is strongly disadvised to run it in a user or a production
environment because it can break the host packages system.
"pip install --user" or venv are usually prefered in most situations.
- Nested virtualization support is not always enough to build OVS image
inside a VM. The full KVM interface must be exposed, what may require
extra configuration such as the KVM paravirtualization in VirtualBox.
- On the recommended version of Ubuntu (20.0.4), installing docker CE
with the given process does not remove docker from snap
when a previous installation of the distribution package docker.io
was already present in the system.
Using docker through snap currently triggers a bug during the SONiC
build process.
https://stackoverflow.com/questions/52526219/docker-mkdir-read-only-file-system
Signed-off-by: Guillaume Lambert <guillaume.lambert@orange.com>
Why I did it
It is to fix the broadcom build failure, it is caused by the build image docker-dhcp-relay:latest not found.
2022-12-14T00:09:57.5464893Z [ FAIL LOG START ] [ target/docker-dhcp-relay.gz-load ]
2022-12-14T00:09:57.5466036Z Attempting docker image lock for docker-dhcp-relay load
2022-12-14T00:09:57.5467113Z Obtained docker image lock for docker-dhcp-relay load
2022-12-14T00:09:57.5468206Z Loading docker image target/docker-dhcp-relay.gz
2022-12-14T00:09:57.5469361Z Loaded image: docker-dhcp-relay:internal.65852159-11ad82a07a
2022-12-14T00:09:57.5470686Z Tagging docker image docker-dhcp-relay:latest as docker-dhcp-relay-sonic:latest
2022-12-14T00:09:57.5471997Z Error response from daemon: No such image: docker-dhcp-relay:latest
2022-12-14T00:09:57.5473122Z [ FAIL LOG END ] [ target/docker-dhcp-relay.gz-load ]
2022-12-14T00:09:57.5539792Z make: *** [slave.mk:1180: target/docker-dhcp-relay.gz-load] Error 1
2022-12-14T00:09:57.5540958Z make: *** Waiting for unfinished jobs....
The image had been built succeeded
2022-12-13T17:01:59.9046935Z [ finished ] [ target/docker-eventd.gz ]
2022-12-13T17:02:00.4947165Z [ building ] [ target/docker-dhcp-relay.gz ]
2022-12-13T17:02:00.6688627Z /sonic/dockers/docker-dhcp-relay/cli-plugin-tests /sonic
2022-12-13T17:02:41.1123955Z /sonic
2022-12-13T17:07:04.1786069Z [ finished ] [ target/docker-dhcp-relay.gz ]
But it was tagged by another value:
Obtained docker image lock for docker-dhcp-relay save
Tagging docker image docker-dhcp-relay-sonic:latest as docker-dhcp-relay:internal.65852159-11ad82a07a
Saving docker image docker-dhcp-relay:internal.65852159-11ad82a07a
Released docker image lock for docker-dhcp-relay save
Removing docker image docker-dhcp-relay-sonic:latest
Untagged: docker-dhcp-relay-sonic:latest
target/docker-dhcp-relay.gz
File /dpkg_cache/docker-dhcp-relay.gz-2ddfa01a109ca69b7621f1a-450bae36026d9dee62646f2.tgz saved in cache
[ CACHE::SAVED ] /dpkg_cache/docker-dhcp-relay.gz-2ddfa01a109ca69b7621f1a-450bae36026d9dee62646f2.tgz
How I did it
When the feature SONIC_CONFIG_USE_NATIVE_DOCKERD_FOR_BUILD not enabled, always save as the latest tag, not use the specify version.
The version is dynamic, it is changed when a new commit checked in, but the image of docker-dhcp-relay is not necessary to change.
Why I did it
Update ECMP calculator README file with new instructions how to run the calculator.
How I did it
Update README file.
How to verify it
Read README file.
docker-sonic-vs doesn't have the infra needed for the syslog rate limit
configuration, so it's not going to be rendering jinja templates to
overwrite /etc/rsyslog.conf. This also means that syslog messages would
get logged twice (because both the default /etc/rsyslog.conf file and
/etc/rsyslog.d/50-default.conf are telling it to log to syslog).
Therefore, keep the custom static /etc/rsyslog.conf file for docker-sonic-vs.
Fixessonic-net/sonic-swss#2570.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
How I did it
1、 demo driver will call the s3ip kernel framework interface
How to verify it
run the demo ,it will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
The user framework module complies with s3ip sysfs specification
How I did it
1、 create a s3ip_sysfs service
2、 the s3ip_sysfs service call the “s3ip_sysfs_tool.sh” to install kernel module and run s3ip_load.py
3、 s3ip_load.py will parse the s3ip_sysfs_conf.json configuration file and create /sys_switch/ directory
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide slot and switch_rootsysfs driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide SYSLED and watchdog driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide a sensor driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide a transceiver driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification