Why I did it
In the config_db.json generated by minigraph "admin_status" attribute is missing for the VOQ inband interface port in the PORT table.
How I did it
Changes done to add admin_status attribute for voq inband interface port, if it exists in the PORT table keys.
For multiasic, the back end asics use ip addresss of Loopback4096 for BGP router id. In VOQ multi-asic chassis there are no back end asics. All the asics are front end and the iBGP connections are established via Ethernet-IB of asics. Since these asics are not designated as BackEnd, the ip address of interface Loopback0 is used as BGP router id. Since the ip address of Loopback0 is same for all the asics in the line card, same router id is used for voq iBGP configurations and hence the iBGP connections are not established. Changes are done to fix this
- Why I did it
Update SAI version to 1.19.1. The following was changed:
1. Update license
2. Do not remove and re-apply the same SDK mirror session on LAG
3. FEC fix to support all speeds
4. Improve PG counters performance
5. Fix number of switch priorities for port mirroring
Signed-off-by: Dror Prital <drorp@nvidia.com>
Avoid initializing sfp/thermal/components/fan/psu/leds on simx and create vpd_info file on hw_management when we use mellanox simulator platform
- Why I did it
this is a fix for issue in mellanox simulator platforms. the syseepromd failed on the pmon docker. also "decode-syseeprom" failed also
- How I did it
before initializing thermal/components/fan/psu/leds --> check if we are running on simx
creating the vpd_info on the hw_management folder.
- How to verify it
check if syseepromd process was loaded properly on the pmon docker.
decode-syseeprom is working well without errors/warnings
- Why I did it
to prevent python exception error when executing warm-reboot command on mellanox simulator platform
- How I did it
return None on the watchdog python script on cases that watchdog file is not exist
- How to verify it
warm-reboot is running well without the python error. error message will appear on log on these cases.
in order to avoid this error message we can simulate the watchdog on mellanox simulator platform
Why I did it
Update XGS and DNX SAI to 5.0.0.4 and additional flags needed in saibcm-modules
The following CSP's are merged in 5.0.0.4
CS00012182148 [4.3] Rate Limit Parity error message to syncd/sonic.
CS00012178692 [4.3] ACL drops counted as interface drops
CS00012183901 [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00012070713 [SAI 4.3 , DNX, 8690] Everflow ACL creation fails - brcm_sai_dnx_create_acl_table API fails, with unknown attribute error.
CS00012023263 [4.4] TD3/TH2 : Support 4 lossless queues(2 SW PFCWD and 2 HW PFCWD)
CS00012019578 [4.4] Pre FEC bit-error rate (BER) - DNX and XGS (TD and TH 50/100G)
How I did it
Changes the various make files to include the new SAI release + update the opennsl-modules.
Why I did it
Allows users to host their own local docker registries and utilize them via the REGISTRY_SERVER and REGISTRY_PORT environmental variables
How I did it
Only set REGISTRY_SERVER and REGISTRY_PORT in rules/config if they are unset.
How to verify it
Export environmental variables REGISTRY_SERVER and REGISTRY_PORT to an alternative docker registry. Export the environmental variable ENABLE_DOCKER_BASE_PULL to y.
Ensure the required sonic-slave docker images are not present locally, but are available in the docker registry
Execute make init and make configure
Confirm that the appropriate docker images were pulled from the appropriate docker registry, and not built locally
Update FW version to 2008.3218, fixing the following issues:
- 50G/100G links that are operationally down before warm-reboot are not coming up after warm-reboot
- 50G/100G links with admin shut / no shut commands are not coming up after warm-reboot
Signed-off-by: Dror Prital <drorp@nvidia.com>
- Why I did it
* For SAI - Advance to adopt the following fixes:
1. Better handle not implement object type for resource availability
2. Fix ext dump when saidump is triggered from 2nd process (saidump utility) other than main adapter host (syncd in SONiC)
* For SDK\FW:
- Changes and new features:
1. Added support in SN4600C systems for new module Finisar ET7402-CWDM4 (100G CWDM4 QSFP28 1310nm SM 2KM).
2. Added support for new module MMS1W50-HM (2km transceiver FR4) for 200GbE
3. Improved performance of "per-port-buffer" counters
4. Added support for Kernel 5.10
- Bugs fixes:
On rare occasions (0.5%), in SN4600C systems, when using 100GbE NRZ mode and Fastboot flow, the link up time may take up to 10 seconds
Signed-off-by: Dror Prital <drorp@nvidia.com>
Why I did it
Currently SONiC use the 'isc-dhcp-relay' package to allow DHCP relay functionality on IPv4 networks only.
This will allow the IPv6 functionality along the IPv4 type.
How I did it
Edit supervisord template to start DHCPv6 instances when configured to do so on Config DB.
Align cfg unit test to the new change.
Add DHCPv6 relay minigraph parsing support and a suitable t0 topology xml file for UT.
How to verify it
Configure DHCPv6 agents as described on the feature HLD: Azure/SONiC#765
Test it with real client/server with IPv6 or use the dedicated automatic test: Azure/sonic-mgmt#3565
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
* Split docker-dhcp-relay.supervisord.conf.j2 template into several files for easier code maintenance
Why I did it
Allow deploying DHCPv6 servers following the implementation PR: #7772
How I did it
Add DHCPv6 to minigraph.py on sonic-cfggen tool and improve the unit test to cover this change.
How to verify it
Try to deploy a switch with DHCPv6 servers.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
#### Why I did it
Enhance DHCP monitor application following the implementation PR: https://github.com/Azure/sonic-buildimage/pull/7772
#### How I did it
Add the support for monitoring DHCPv6 packets.
#### How to verify it
Install an image with this PR and the implementation PR.
- Why I did it
Currently dhcp packets are disabled by the COPP manager for non ToRRouter type switches.
Even if the feature is enabled, DHCP packets wont hook to the CPU since the COPP manager will not trap this packets.
This change is to disable dhcp_relay by default for non ToRRouter switches from init_cfg.json.
With this approach, if the user want to enable the feature for non ToRRouter switches, manual enablement is required by the 'feature' configuration.
This is to keep the current approach for MSFT production issue with dhcp relay for non ToRRouter switched and allow the user to decide if to use it or not.
- How I did it
Configure dhcp_relay 'disabled' by default on init_cfg.json for non ToRRouter switches.
Remove the exclusion of dhcp packets on copp_cfg.json
- How to verify it
Enable dhcp_relay feature on a non ToRRouter switch.
Unit-tests modified so the default values on mocked CONFIG DB in 'test_vectors.py' for dhcp_relay will be 'disabled'.
This is by the change for 'init_cfg.json.j2'.
For ToRRouter the state will change from 'disabled' to 'enabled'.
Another test case added for a 'ToR' switch type, this is to test the state is 'enabled' if the user configured it to be so.
Why I did it
Currently hostcfgd is implemented in a way each feature which is enabled/disabled triggering execution of systemctl enable/unmask commands which eventually trigger 'systemctl daemon-reload' command.
Each call like this cost 0.6s and overall add a overhead of ~12 seconds of CPU time.
This change will verify the desired state of a feature and the current state of this feature on systemd and trigger a system call only when must.
How I did it
Check each feature status on systemd before executing a system call to enable and reload the systemctl daemon.
How to verify it
Build an image with this change and observe less system calls are executed.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
NOTE: This is cherry-pick from 1911/2012 to master.
- Why I did it
To fix LAG IP configuration race
- How I did it
Extended timeout for teammgrd
- How to verify it
Add >80 router LAGs. Do config reload
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
After https://github.com/Azure/sonic-buildimage/pull/7598 the packages.json generation is broken. This change fixes it make the whole build fail in case generation failed.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
Static route configuration should not depend on BGP_ASN. Remove the dependency on BGP_ASN for StaticRouteMgr.
Fix#8027
How I did it
Check if BGP_ASN field before configuring static route redistribution and wait until BGP_ASN is available to enable static route redistribution.
How to verify it
Add unit test to cover the scenario and verify the functionality on a virtual switch.
Why I did it
systemd-sonic-generator limits multi-asic unit file instances to 10 (single digit instance number 0 - 10). This limitation needs to be removed to handle more than 10 asics.
MAX_NUM_TARGETS and MAX_NUM_INSTALL_LINES limits to 15 which is not sufficient for systems with more than 15 asics.
Inside get_unit_files(), strcmp produce incorrect results due to non null terminated string being compared.
Added build UT support for systemd-sonic-generator
Updates:
888701b [Mellanox] Remove mstdump from Mellanoxs collect dump script ([Azure/sonic-utilities#1706])
4818360 [sonic-package-manager] support warm/fast reboot for extension packages ([Azure/sonic-utilities#1554])
793b847 [show priority-group drop counters] Remove backup with cached PG drop counters after 'config reload' ([Azure/sonic-utilities#1679])
24fe1ac [show][config] support for interface alias for muxcable commands ([Azure/sonic-utilities#1699])
186d8513 Pcieutil to load the platform api first instead of using common api (#1672)
7a82c069 [Mellanox] Update mellanox dump generation to include SDK dumps (#1640)
38f8c068 [sfputil] Expose error status fetched from STATE_DB or platform API to CLI (#1658)
c5d00ae4 [pfcwd] Fix the return code in invalid case (#1691)
57dc4032 [ci]: Fix config prompt question issue (#1693)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
The voq system lag id boundary is set in redis-chassis. Changes include
setting this from database-chassis container. This fixes a timing issue
in finding datbase_config.json file from redis directory which is
created from database container. Since database container usually
starts after database-chassis container the existence of this file is
unreliable while running the command. Running the command under
database-chassis container makes sure that the database_config.json form
redis-chassis directory is guaranteed to be available and hence fixes the
timing issue.
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
#### Why I did it
ethtool can be used to query and change settings such as speed, auto- negotiation and checksum offload on many network devices, especially Ethernet devices.
#### How I did it
add package extension to docker-platform-monitor/Dockerfile.j2
#### Why I did it
The libpci library provides portable access to configuration registers of devices connected to the PCI bus.
#### How I did it
update dockers/docker-platform-monitor/Dockerfile.j2
Why I did it
Multiple build failed in 202012 branch
It is caused by the disorder of the package urls retrieved from the command "apt-get download --print-urls "
Why I did it
We hit an issue recently in the chassis bringup where the linux bde attach failed with the following ioctl error.
[ 9058.585960] linux-user-bde (897363): Error: Invalid ioctl (00004c1d)
[ 9105.668237] linux-user-bde (901002): Error: Invalid ioctl (00004c1d)
Debugged with Broadcom team, who suggested to use this flag BCM_INSTANCE_SUPPORT to support multi-instance scenarios ( platforms with more than one asic where there are separate sai/syncd docker instances running controller each asic instance).
This flag was introduced since SDK-6.5.21 and need to be present in SAI and SAI GPL kernel module makefile.
How I did it
Add the flag in this flag BCM_INSTANCE_SUPPORT in gpl modules
Why I did it
To determine the revision of the pcie.yaml to be used based on BIOS version in DellEMC S6100 platform.
Depends on: Azure/sonic-platform-common#195
How I did it
Added two revisions of pcie.yaml pcie_1.yaml and pcie_2.yaml
Included a platform-specific Pcie class to provide the revision of the pcie.yaml to be used by pcieutil/pcied.
How to verify it
Execute pcieutil check (Azure/sonic-utilities#1672) command and verify the list of PCIe devices displayed.
Logs: UT_logs.txt
Signed-off-by: Stepan Blyschak stepanb@mellanox.com
Why I did it
To support building DHCP relay as extension and installing it during build time.
How I did it
Created infrastructure. Users need to define their packages in rules/sonic-packages.mk
How to verify it
Together with #6531
Before this change, a process running inside every SONiC container dealt with FEATURE table 'auto_restart' field and depending on the value decided whether a container has to be killed or not.
If killed service auto restart mechanism restarts the container.
This change moves the logic from container to the host daemon - hostcfgd.
The 'auto_restart' handling is kept in supervisor-proc-exit-listener but now it is not required for container that wants to support auto restart feature.
hostcfgd refactoring - move feature handling in another class.
override systemd service Restart= setting from hostcfgd.
remove default systemd Restart=always.
Signed-off-by: Stepan Blyshchak stepanb@nvidia.com
- Why I did it
Remove the need to deal with container orchestration logic from the container itself. Leave this logic to the orchestrator - host OS.
- How I did it
hostcfgd configures 'Restart=' value for systemd service.
- How to verify it
root@r-tigon-11:/home/admin# sudo config feature autorestart lldp enabled
root@r-tigon-11:/home/admin# show feature status | grep lldp
lldp enabled enabled
root@r-tigon-11:/home/admin# docker exec -it lldp pkill -9 lldpd
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Exited (0) 20 seconds ago lldp
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Up 5 seconds lldp
root@r-tigon-11:/home/admin# sudo config feature autorestart lldp disabled
root@r-tigon-11:/home/admin# docker exec -it lldp pkill -9 lldpd
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Up 35 seconds lldp
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Exited (0) 3 seconds ago lldp
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Exited (0) 39 seconds ago lldp
root@r-tigon-11:/home/admin#
Advance submodule head for sonic-swss
32261636 [BufferOrch] Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL (Azure/sonic-swss#1786)
6c88e47a [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking (Azure/sonic-swss#1781)
e86b900d [MPLS] sonic-swss changes for MPLS (Azure/sonic-swss#1686)
4c8e2b53 [Dynamic Buffer Calc] Avoid creating lossy PG for admin down ports during initialization (Azure/sonic-swss#1776)
36021246 [VS test stability] Skip flaky test for DPB (Azure/sonic-swss#1807)
c37cc1c5 Support for in-band-mgmt via management VRF (Azure/sonic-swss#1726)
1e3a532d Fix config prompt question issue (Azure/sonic-swss#1799)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
A recent version of contextlib2 (https://pypi.org/project/contextlib2/21.6.0/#history) has broken Python2 compatibility, so the version picked up by netaddr when using Python2 must be specified, or else builds fail
Co-authored-by: Tom Zhu <tom.zhu@metaswitch.com>
#### Why I did it
Support API 2.0 for S5248F platform
#### How I did it
Making changes to S5248F platform specific directory
Co-authored-by: Arun LK <Arun_L_K@dell.com>
#### Why I did it
To ensure any environment variables which are configured in the build/test environment do not influence the behavior of sonic-py-common during unit tests. For example, variables which might be set by continuous integration pipelines.
#### How I did it
Add class-scoped pytest fixture to `TestDeviceInfo` class which stashes the current environment variables, clears them and yields. Once all the test cases in the class finish, the fixture will restore the original environment variables.
Also remove unnecessary unittest-style setup and teardown functions from interface_test.py
Advance submodule update with the following changes:
4475750 Config reload fix (#29)
cf60d5e [ci]: add proper azp (#26)
f0fbfe7 [CI] Set up CI with Azure Pipelines (#25)
879d7bd Include port default fec configuration to be included in ZTP configuration (#24)
a6ae955 Add a pre-defined plugin to download a list of files (#23)
6f0305b [MultiDB] Add multidb support to sonic-ztp (#16)
Why I did it
MMU configuration for DellEMC Z9332 systems in T0/T1 topology
How I did it
Updated config.bcm, QoS/Buffer pool and lossy/lossless profile settings
How to verify it
Verified that Dell systems are booting up fine and basic test cases passing.
Discussion and requirement in Chassis discussion forum to NOT make the asic-id field in the DEVICE_METADATA mandatory. If this field "asic-id" is not present the orchagent will be started without the -i <asic_id> parameter
Ref: https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-orchagent/orchagent.sh#L39
How I did it
Made the check to see if the asic-id is valid and update the asic-id field in the DEVICE_METADATA