The display of azure pipeline is not specific now, such as when the step Run test fails, the display of itself shows successful, but the display of step Kvmdump shows fails, but actually, the step Kvmdump doesn't fail. I improve the display of azure pipeline in this pr, each step has its own success or failure, and is shown in azure pipeline.
Why I did it
The display of azure pipeline is not specific now, such as when the step Run test fails, the display of itself shows successful, but the display of step Kvmdump shows fails, but actually, the step Kvmdump doesn't fail. I improve the display of azure pipeline in this pr, each step has its own success or failure, and is shown in azure pipeline.
How I did it
Each step has its own signature of success or failure.
Using the chain of responsibility pattern to manage all status.
Modify the expected-state in each step.
- Why I did it
To include latest fixes and new functionality
* SAI
1. Temporary WA for query enum capabilities for tunnel peer mode, to not return P2P
2. sai debug dump returns while last extra dump is running
3. open inner SRC and DST IP for ECMP / LAG general hash objects
4. tunnel peer mode returns hard coded
5. tunnel decap dscp mode
6. support default tunnel src ip
7. failure to add a port to a LAG in VLANs configured with flood_ctrl
8. Add P2P peer mode for IP in IP tunnels
9. Add per port IP counters
10. Clean up VXLAN srcport static (XML) functionality, as only dynamic (API) is in use
11. Fix enum capabilities of native hash fields
12. sai_acl_db_group_ptr usage
13. Clean QoS config of the LAG when all members was removed (bug
* SDK/FW
1. Fixed bug in recovery mechanism in case of I2C error when trying to access the XSFP module.
2. On the NVIDIA Spectrum-2 switch, when receiving a packet with Symbol Errors on ports that are configured to cut-thought mode, a pipeline might get stuck.
3. On the Spectrum-2 and Spectrum-3 switch, if you enable ECN marking and the port is in split mode, traffic sent to the port under congestion (for example, when connecting two ports with a total speed of 50GbE to a single 25GbE port) is not marked.
5. Modifying existing entry/Adding new one when switch is at its maximum capacity (full by maximum allowed entries from any type such as routes, FDB, and so forth), will fail with an error.
6. When many ports are active (e.g., 70 ports up), and the configuration of shared buffer is applied on the fly, occasionally, the firmware might get stuck.
7. When a system has more than 256 ACL rules, on rare occasion, removing/adding rules may cause some ACL rules not to work.
8. On SN2201 system, on RJ45 port, the link might appear in 'down' state even if it operations properly.
9. Layer 4 port information is not initialized for BFD packet event. To address the issue, remote peer UDP port information was added in BFD packet event.
10. When setting LAG as a SPAN analyzer, the distributor mode of the LAG members was not taken into account. It may happen that the LAG member with distributor mode disabled will be set as a SPAN analyzer port.
- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Update sonic-utilities submodule pointer to include the following:
* c9ed09d [202205] [sonic_installer] use /etc/resolv.conf from the host when migrating packages (#2573) ([#2575](https://github.com/sonic-net/sonic-utilities/pull/2575))
Signed-off-by: dprital <drorp@nvidia.com>
Update sonic-utilities submodule pointer to include the following:
3bc2bc6 [Mellanox][202205] Change severity to NOTICE in Mellanox buffer migrator when unable to fetch DEVICE_METADATA due to empty CONFIG_DB during initialization (#2570)
e1c8243 [202205][generate_dump] Fix for a deletion flow for all secret files in the techsupport dump (#2572)
9f2984a [202205] Fix issue: unconfigured PGs are displayed in watermarkstat (#2568)
f7988b0 [202205] [timer.unit.j2] use wanted-by in timer unit (#2561)
f45dcfb [generate_dump] Optimize the execution time of 'show techsupport' CLI by paraller function execution (#2565)
67cbb15 [202205]Fixes 12170: Delete subinterface and recreate the subinterface in default-vrf (#2564)
93172c4 [202205] [generate_dump] Optimize the execution time of the 'show techsupport' script to 5-10% by reducing calls to the 'tar append' operation (#2562)
Signed-off-by: dprital <drorp@nvidia.com>
Why I did it
why
In order to apply different config across different platform, and use the code with a unified format, reuse syncd init script to init saiserver.
How I did it
how
Reuse syncd init script
How to verify it
Test
Test in DUT s6000 and dx010 with sonic 202205
Why I did it
Unify the Debian mirror sources
Make easy to upgrade to the next Debian release, not source url code change required. Support to customize the Debian mirror sources during the build
Relative issue: #12523
How I did it
How to verify it
Avoid traceback on sonic-clear command
sonic-clear dhcp6relay_counters
Traceback (most recent call last):
File "/usr/local/bin/sonic-clear", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/clear/plugins/dhcp-relay.py", line 19, in dhcp6relay_clear_counters
counter = DHCPv6_Counter()
NameError: name 'DHCPv6_Counter' is not defined
- How I did it
Corrected the way to import using importlib
- How to verify it
Tested the sonic-clear command and verified no traceback is seen
Why I did it
Currently sonic-slave-* tag is confusing. Set correct tag on sonic-slave-* image.
Fix job name to fit the build.
How I did it
build amd image in amd64:
sonic-slave-bullseye:cfe29bff67c
sonic-slave-bullseye:latest
sonic-slave-bullseye:master
build armhf image in amd64:
sonic-slave-bullseye-march-armhf:33614806dc3
sonic-slave-bullseye-march-armhf:latest
sonic-slave-bullseye-march-armhf:master
build arm64 image in amd64:
sonic-slave-bullseye-march-arm64:f3b1b16c801
sonic-slave-bullseye-march-arm64:latest
sonic-slave-bullseye-march-arm64:master
build arm64 image in arm64:
sonic-slave-bullseye:75cb326c9a7
sonic-slave-bullseye-arm64:latest
sonic-slave-bullseye:master
build armhf image in armhf:
sonic-slave-bullseye:64d178951fc
sonic-slave-bullseye-armhf:latest
sonic-slave-bullseye:master
How to verify it
Why I did it
Advance SAI Redis head pointer
How I did it
changes:
sonic-net/sonic-sairedis@cf679e7sonic-net/sonic-sairedis@8d6688e
[202205][Submodule][SAI]Advance SAI head pointer sonic-sairedis#1185 sonic-net/sonic-sairedis@66f2961
remove parameter --skip_error, which removed from [202205][Submodule][SAI]Advance SAI head pointer sonic-sairedis#1185
How to verify it
local image build
Why I did it
Makefile needs some dependencies from the Internet. It will fail for network related issue.
Retries will fix most of these issues.
How I did it
Add retries when running commands which maybe related with networking.
How to verify it
This PR is to backport #13100 to the 202205 branch since can not be cleanly cherry-picked.
Following code to judge whether a process is running inside a docker could get stuck on the simx platform
subprocess.Popen(["docker", "--version"],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
universal_newlines=True)
When it gets stuck, the config-chassisdb service can not be successfully started, thus the system can not be booted up.
root@sonic:/# service config-chassisdb status
config-chassisdb.service - Config chassis_db
Loaded: loaded (/lib/systemd/system/config-chassisdb.service; enabled; vendor preset: enabled)
Active: activating (start) since Thu 2022-12-15 09:23:02 UTC; 29min ago
Main PID: 571 (config-chassisd)
Tasks: 14 (limit: 9501)
Memory: 132.4M
CGroup: /system.slice/config-chassisdb.service
├─571 /bin/bash /usr/bin/config-chassisdb
├─575 /usr/bin/python3 /usr/local/bin/sonic-cfggen -H -v DEVICE_METADATA.localhost.platform
├─602 /bin/sh -c sudo decode-syseeprom -m
├─603 sudo decode-syseeprom -m
├─607 /usr/bin/python3 /usr/local/bin/decode-syseeprom -m
├─616 /bin/sh -c docker --version 2>/dev/null
└─617 docker --version
- How I did it
Use an alternative way to implement this function and issue can be avoided:
docker_env_file = '/.dockerenv'
return os.path.exists(docker_env_file) is False
- How to verify it
run regression on real hardware and simx platform.
Why I did it
In order to prevent the sonic-mgmt/tests/platform_tests/sfp/test_sfputil.py test failing on the log analyzer step.
The mentioned test is performing the sfputil reset EthernetX for every interface on the SONiC switch, this action will flap the SFP device status (INSTERTED -> REMOVED -> INSTERTED).
The SONiC XCVRD daemon will catch this SFP device status change (because it is monitoring the presence status of the cable).
To judge the cable presence status, currently, we are still leveraging to read the first bytes of the EEPROM, and the EEPROM could be not ready at some moment and the SONiC XCVRD daemon will print the error log to Syslog:
ERR pmon#xcvrd: Error! Unable to read data for 'xx' port, page 'xx' offset 128, rc = 1, err msg: Sending access register
How I did it
Change logging severity from ERR to WARNING
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
Add missing system_ref_core_clock_khz in Arista-7800R3A-36D2-C36 and Arista-7800R3A-36D2-C72
Co-authored-by: Maxime Lorrillere <mlorrillere@arista.com>
This PR is a required for changing the L3 IP forwarding Behavior to SoC in active-active toplogy.
Basically a src IP is added to the SNAT rule so that only packets originating from ToR with src IP as vlan IP get natted by the rule and change the src IP to LoopBack IP
Master Branch PR with combined change is here
sonic-net/sonic-host-services#3
How I did it
check the config DB if the ToR is a DualToR and has an SoC IP assigned.
put an iptable rule
iptables -t nat -A POSTROUTING --destination -j SNAT --to-source "
Signed-off-by: vaibhav-dahiya vdahiya@microsoft.com
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Port indexes of front panel ports are not contiguous in multi-asic because we didn't distiguish between
front panel and internal ports, e.g., recycle ports. Fix this by assigning index to front panel port first
and then internal ports.
Co-authored-by: Song Yuan <64041228+ysmanman@users.noreply.github.com>
Why I did it
To ensure, that after a BGP startup, dualtor T0 receives BGP updates before sending out BGP updates.
Please refer to sonic-net/SONiC#1161 for more details.
How I did it
add coalesce-time 10000 to the frr bgp startup config.
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Why I did it
To bring the following fixes:
Applied customer patch for limited port breakout support in rel_ocp_sai_7_1.
Revert "Merged PR 7224850: SAI7.1_DNX: Temp workaround for Nexthop Group Scale Issue(CS00012251649)".
backport SONIC-67662 to SAI7.1:JR2C ECMP partition for NHgroup members.
How I did it
Updated SAI code with the fixes above.
How to verify it
Run the SONiC and SAI test with the SAI pipeline.
Add missing aggregate port_config.ini needed by sonic-mgmt
Concatenate the ASIC specific port_config.ini from device/arista/x86_64-arista_7800r3a_36d2_lc/Arista-7800R3A-36D2-C36/[01] to create the aggregate file.
Why I did it
Update Marvell armhf sai debian version. Includes supports for,
Autoneg
Everflow
How to verify it
Compile and load SONiC armhf bin
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
- Why I did it
Fixed sflow yang model to include collector_vrf field.
- How I did it
Added leaf for collector_vrf under sflow_collector. Additionally aligned the configuration guide
- How to verify it
Added UT to verify.
Debian is shipping a systemd timer unit for logrotate, but we're also
packaging in a cron job, which means both of them will run, potentially
at the same time. Remove our cron file, and add an override to the
shipped timer file to have it be run every 10 minutes.
Fixes#12392.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
PR #12829 modified the docker tagging scheme such that optional docker
containers would be tagged with the SONiC image version. However, the
docker-image-load macro wasn't updated for these changes. Update it
here.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This change is to disable the pcie firmware check done by Broadcom SAI. The change is needed for the Arista platform x86_64-arista_7050cx3_32s; otherwise, the check will fail, blocking the initialization.
There was a pcie firmware check added in brcm SDK and certain Arista hardwares do not compliant with the check, so we added the disable_pcie_firmware_check originally for x86_64-arista_7060dx4_32. For x86_64-arista_7050cx3_32s, it was able to pass the check but some firmware change done in August made it fail.