When a package is referenced from the web through wget command,
it downloads the package for every build.
This feature caches all the packages that are being downloaded from the
web, so that subsequent build always loads the cache instead of from
web.
* [202205] Update SOC properties for DLR_INIT based pfcwd recovery (#15217)
Why I did it
Update soc properties for certain roles that need to use pfcwd dlr init based recovery mechanism
How to verify it
Updated the templates on a 7050cx3 dual tor and 7260 T1 which satisfies these conditions and validated pfcwd recovery which uses DLR_INIT based mechanism. Also validated that this mechanism is not used on 7050cx3 single tor with the updated templates
Signed-off-by: Neetha John <nejo@microsoft.com>
* AclInterface and Management Interfaces are parsed on finding first valid node for it.
Above logic works for multi-asic scenarios where ACL Interface and Management Interfaces are present in DPG order {Host, Asicx, Asicy} but not when DPG is in {Asicx, Asicy, Host} order.
* [static_route][staticroutebfd]fix an issue on deleting a non-bfd static route
Fix an issue for deleting a non-bfd static route also remove the staticroutebfd from critical_processes list and make it auto restart in the case of crash.
What I did:
Added change to add 'peerType' as element in NEIGH_STATE_TABLE.
'peerType' can be i-BGP vs e-BGP determined based on local and remote AS number.
Why I did:
This is useful to filter neighbors in SONiC as internal vs external in chassis use-case (example: telemetry)
Verification:
Manual Verification
127.0.0.1:6379[6]> hgetall "NEIGH_STATE_TABLE|10.0.0.5"
1) "state"
2) "Established"
3) "peerType"
4) "e-BGP"
127.0.0.1:6379[6]> hgetall "NEIGH_STATE_TABLE|2603:10e2:400::4"
1) "state"
2) "Established"
3) "peerType"
4) "i-BGP"
Also sonic-mgmt test case test_bgp_fact.py is enhanced: Enhanced bgp_fact to validate NEIGH_STATE_TABLE element 'peerType' sonic-mgmt#8462
Stop authorization after user being rejected by server.
#### Why I did it
Fix nss_tacplus bug: after user being rejected by one TACACS+ server, nss_tacplus will try with next TACACS+ server.
##### Work item tracking
- Microsoft ADO :15276692
#### How I did it
Check authorization result, stop authorization after user being rejected by server.
#### How to verify it
Pass all E2E test.
Create new UT: https://github.com/sonic-net/sonic-mgmt/pull/8345
#### Description for the changelog
Stop authorization after user being rejected by server.
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
- Why I did it
If you enable feature and then disable it, System Ready status change to Not Ready
A disabled feature should not affect the system ready status.
- How I did it
During the disable flow of dhcp_relay, it entered the dnsrvs_name list, which caused the SYSTEM_STATE key to be set to DOWN. Right after that, the dhcp_relay service was removed from the full service list, however, but, when it was removed from the dnsrvs_name, there was no flow to reset the system state back to UP even though there was no more services in down state.
- How to verify it
root@qa-eth-vt01-2-3700v:/home/admin# config feature state dhcp_relay enabled
root@qa-eth-vt01-2-3700v:/home/admin# show system-health sysready-status
root@qa-eth-vt01-2-3700v:/home/admin# config feature state dhcp_relay disabled
root@qa-eth-vt01-2-3700v:/home/admin# show system-health sysready-status
Should see
System is ready
- Why I did it
interfaces-config service restarts networking service, which in-turn results in loopback interface address is being removed and reassigned back
If the system-health happens to start during that instance expections and logs like this are seen:
Apr 15 18:14:49.357869 r-panther-20 ERR healthd: update system status exception:Unable to connect to redis: Cannot assign requested address
Apr 15 18:14:49.429778 r-panther-20 ERR healthd: subscribe_statedb exited- Unable to connect to redis: Cannot assign requested address
Apr 15 18:14:52.218594 r-panther-20 ERR healthd: system_service_Map_base::at
Apr 15 18:14:52.219714 r-panther-20 ERR healthd: system_service_Map_base::at
Apr 15 18:14:55.218636 r-panther-20 ERR healthd: system_service_Map_base::at
Apr 15 18:14:55.218722 r-panther-20 ERR healthd: system_service_Map_base::at
- How I did it
use unix socket path
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Why I did it
Introduce a new valid neighbor element type to YANG.
Work item tracking
Microsoft ADO (number only): 23994521
How I did it
Add MgmtLeafRouter to element network type list.
How to verify it
Passes UTs
- Run pre-commit tox profile to trim all trailing blanks
- Use several commits with a per-folder based strategy
to ease their merge
Issue #15114
Signed-off-by: Guillaume Lambert <guillaume.lambert@orange.com>
Why I did it
We need to store information of power shelf in config_db for SONiC MX switch. Current minigraph parser cannot parse rack_mgmt_map field.
Work item tracking
Microsoft ADO (number only): 22179645
How I did it
Add support for parsing rack_mgmt_map.
What I did:
Updated Static Route Attribute in Minigraph. NGS Minigraph has define semantics of static route differently.
See below for differences:-
Microsoft ADO: 17956325
Before
<AssociatedTo>8.0.0.1/32</AssociatedTo>
<Address>192.168.1.2,192.168.2.2</Address>
<AttachTo>PortChannel40,PortChannel50</AttachTo>
Now:
<Address>8.0.0.1</Address>
<AttachTo>PortChannel40,192.168.1.2;PortChannel50,192.168.2.2</AttachTo>
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Update PG headroom settings ports based on port speed/cable length
* Updated XOFF settings to use chip level numbers than core
* Updated PG headroom based on uplink/downlink side
* fix for sonic-config-gen tests
* More fixes for unit test cases
* more test fixes
* Merged multiple functions into one
Add new Nokia build target and establish an arm64 build:
Platform: arm64-nokia_ixs7215_52xb-r0
HwSKU: Nokia-7215-A1
ASIC: marvell
Port Config: 48x1G + 4x10G
How I did it
- Change make files for saiserver and syncd to use Bulleseye kernel
- Change Marvell SAI version to 1.11.0-1
- Add Prestera make files to build kernel, Flattened Device Tree blob and ramdisk for arm64 platforms
- Provide device and platform related files for new platform support (arm64-nokia_ixs7215_52xb-r0).
Following changes are done:
Added Support where if asic configuration is not present in minigraph sonic-cfggen do not error out but instead process it gracefully.
Use Case: In Supervisor we have number of asic are define as max possible but in minigraph configuration of only valid/available asics only are present. Without this change load_minigraph fails.
Microsoft ADO: 17956325
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Why I did it
Our k8s feature will pull new version container images for each upgrade, the container images inside sonic will be more and more, but for now we don’t have a way to clean up the old version container images, the disk may be filled up. Need to add cleaning up the old version container images logic.
Work item tracking
Microsoft ADO (number only):
17979809
How I did it
Remove the old version container images besides the feature's current version and last version image, last version image is saved for supporting fallback.
How to verify it
Check whether the old version images are removed
Why I did it
Fix#15000
isc-dhcp 4.4.1-2.3+deb11u1 is no longer available in debian repository
How I did it
update isc-dhcp to new version 4.4.1-2.3+deb11u2
Why I did it
replace yaml.load to yaml.safe_load because yaml.safe_load is more secure
Work item tracking
Microsoft ADO (number only): 15022050
How I did it
How to verify it
Verified in DUT 201911 which yaml version < 5.1
- Why I did it
Add fan direction check to system health, all fans should be in the same direction
- How I did it
Add fan direction check to system health, all fans should be in the same direction
- How to verify it
Manual test
Unit test
Added sonic-mgmt test case to verify
Why I did it
To support BGPMon sessions from each T2 linecard ASIC
Work item tracking
Microsoft ADO (number only): 17873174
How I did it
Added change in BGPMon configuration to use Loopback4096 as source interface, since this has a unique IP per ASIC.
How to verify it
Tested by manually setting up BGPMon session on T2 LC and verified that Loopback4096 could be used as source
What I did:
In FRR command update source <interface-name> is not at address-family level. Because of this
internal peer route-map for ipv6 were getting applied to ipv4 address family. As a result
TSA over iBGP for Ipv6 was not getting applied.
How I verify:
Manual Verification of TSA over both ipv4 and ipv6 after fix works fine.
Updated UT for this.
Added sonic-mgmt test gap: sonic-net/sonic-mgmt#8170
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
There are chassis-packet and Single asic platforms which support this 400G to 100G/40G speed change via config.
Enabling this feature for all platforms which can support this. Keeping it enabled for all does not affect the platforms
which do not support this feature yet.
Signed-off-by: anamehra anamehra@cisco.com
Why I did it
Do not require leafref as part of yang. Only need string to compare whether string received from event matches what is possible for ifname.
How I did it
How to verify it
Run UT
Why I did it
Update ECN settings for T2 chassis
How I did it
Updated qos config file to load these settings during switch bootup
How to verify it
Verified on line card on T2 chassis
Why I did it
Support for SONIC chassis isolation using TSA and un-isolation using TSB from supervisor module
Work item tracking
Microsoft ADO (number only): 17826134
How I did it
When TSA is run on the supervisor, it triggers TSA on each of the linecards using the secure rexec infrastructure introduced in sonic-net/sonic-utilities#2701. User password is requested to allow secure login to linecards through ssh, before execution of TSA/TSB on the linecards
TSA of the chassis withdraws routes from all the external BGP neighbors on each linecard, in order to isolate the entire chassis. No route withdrawal is done from the internal BGP sessions between the linecards to prevent transient drops during internal route deletion. With these changes, complete isolation of a single linecard using TSA will not be possible (a separate CLI/script option will be introduced at a later time to achieve this)
Changes also include no-stats option with TSC for quick retrieval of the current system isolation state
This PR also reverts changes in #11403
How to verify it
These changes have a dependency on sonic-net/sonic-utilities#2701 for testing
Run TSA from supervisor module and ensure transition to Maintenance mode on each linecard
Verify that all routes are withdrawn from eBGP neighbors on all linecards
Run TSB from supervisor module and ensure transition to Normal mode on each linecard
Verify that all routes are re-advertised from eBGP neighbors on all linecards
Run TSC no-stats from supervisor and verify that just the system maintenance state is returned from all linecards
Why I did it
systemd stop event on service with timers can sometime delete the state_db entry for the corresponding service.
Note: This won't be observed on the latest master label since the dependency on timer was removed with the recent config reload enhancement. However, it is better to have the fix since there might be some systemd services added to system health daemon in the future which may contain timers
root@qa-eth-vt01-4-3700c:/home/admin# systemctl stop snmp
root@qa-eth-vt01-4-3700c:/home/admin# show system-health sysready-status
System is not ready - one or more services are not up
Service-Name Service-Status App-Ready-Status Down-Reason
---------------------- ---------------- ------------------ -------------
<Truncated>
ssh OK OK -
swss OK OK -
syncd OK OK -
sysstat OK OK -
teamd OK OK -
telemetry OK OK -
what-just-happened OK OK -
ztp OK OK -
<Truncated>
Expected
Should see a Down entry for SNMP instead of the entry being deleted from the STATE_DB
root@qa-eth-vt01-4-3700c:/home/admin# show system-health sysready-status
System is not ready - one or more services are not up
Service-Name Service-Status App-Ready-Status Down-Reason
---------------------- ---------------- ------------------ -------------
<Truncated>
snmp Down Down Inactive
ssh OK OK -
swss OK OK -
syncd OK OK -
sysstat OK OK -
teamd OK OK -
telemetry OK OK -
what-just-happened OK OK -
ztp OK OK -
<Truncated>
How I did it
Happens because the timer is usually a PartOf service and thus a stop on service is propagated to timer. Fixed the logic to handle this
Apr 18 02:06:47.711252 r-lionfish-16 DEBUG healthd: Main process- received event:snmp.service from source:sysbus time:2023-04-17 23:06:47
Apr 18 02:06:47.711347 r-lionfish-16 INFO healthd: check_unit_status for [ snmp.service ]
Apr 18 02:06:47.722363 r-lionfish-16 INFO healthd: snmp.service service state changed to [inactive/dead]
Apr 18 02:06:47.723230 r-lionfish-16 DEBUG healthd: Main process- received event:snmp.timer from source:sysbus time:2023-04-17 23:06:47
Apr 18 02:06:47.723328 r-lionfish-16 INFO healthd: check_unit_status for [ snmp.timer ]
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
* Support ACL interface type BmcData in minigraph parser
* Support ACL interface type BmcData in minigraph parser
* add unittest
* Add a global dict for storing the defination of custom acl tables
- Why I did it
Extend the CRM YANG model with DASH attributes.
- How I did it
Add new attributes to the existing CRM YANG model.
Implement tests for DASH CRM attributes.
- How to verify it
Build sonic_yang_models packages. The tests will be run automatically.
Fix per-command authorization failed issue when a command with wildcard match more than hundred files.
#### Why I did it
When user enable TACACS per-command authorization, and run a command with wildcard , if the command match more than hundreds of files, the per-command authorization will failed with following message:
*** authorize failed by TACACS+ with given arguments, not executing
The root cause of this issue is because bash will match files with wildcard and replace with wildcard args with matched files. when there are too many files, TACACS plugin will generate a big authorization request, which will be reject by server side.
##### Work item tracking
- Microsoft ADO **(number only)**: 18074861
#### How I did it
Fix bash patch file, use original user inputs as authorization parameters.
#### How to verify it
Pass all UT.
Create new UT to validate the TACACS authorization request are using original command arguments.
UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8115
#### Which release branch to backport (provide reason below if selected)
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [X] 202205
- [X] 202211
#### Tested branch (Please provide the tested image version)
- [x] 202205.258490-412b83d0f
- [x] 202211.71966120-1b971c54b5
#### Description for the changelog
Fix per-command authorization failed issue when a command with wildcard match more than hundred files.
#### Why I did it
sonic-config-engine unit test needs to detect missing yang models.
#### How I did it
Update unit test, return error for missing yang models.
#### How to verify it
Run unit test for sonic-config-engine
#### Why I did it
Yang model for NEIGH table was missing
Fixed https://github.com/sonic-net/sonic-buildimage/issues/13971
#### How I did it
added sonic-neigh.yang model
#### How to verify it
make buildimage
#### Description for the changelog
Adding NEIGH yang model
Signed-off-by: Stepan Blyschak stepanb@nvidia.com
DEPENDS: #12852
Why I did it
To support BGP pending FIB suppression.
How I did it
I backported patches from FRR 8.4 feature that allows communicating ASIC route status back to FRR.
Also, added a new field in DEVICE_METADATA YANG model table. Added UT for YANG model changes.
How to verify it
Run on the switch.
Open
[minigraph] add support for changing T1 ports speed from 400G to 100G and vice-versa
#14505
vdahiya12 wants to merge 9 commits into sonic-net:master from vdahiya12:dev/vdahiya/minigraph_parser
Conversation 10
Commits 9
Checks 18
Files changed 5
Conversation
vdahiya12
@vdahiya12 vdahiya12 commented 2 weeks ago •
On SONiC T1 cisco 8101 HwSku, the speed changes are done from 400G to 100G needs to be supported on 400G ports.
To enable this, along with speed change the port lanes need to be changed. This PR has the changes to update the port lanes when such speed change happens.
Basically if Banwidth in minigraph.xml intends to enable a 100G speed on a 400G port, then the appropriate lane change and speed change needs to be invoked in mingraph parser
Example if port_config.ini dicatates the speed to be 400G and minigraph has 100G speed, then this changeneeds to be accommodated
# name lanes alias index speed channel
Ethernet96 1536,1537,1538,1539,1540,1541,1542,1543 etp12 12 400000 0
<DeviceLinkBase>
<ElementType>DeviceInterfaceLink</ElementType>
<EndDevice>ARISTA01T2</EndDevice>
<EndPort>Ethernet1</EndPort>
<StartDevice>Device-8101-01</StartDevice>
<StartPort>etp12</StartPort>
<Bandwidth>100000</Bandwidth>
</DeviceLinkBase>
These platforms today have 400g port with 8 serdes lines, and 100g will operate with 4 serdes lane. When the port speed changes from 400G to 100G the first 4 lanes will be used for 100G port.
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
* Fix backend port channels and routes being displayed
In `show interface portchannel` and `show ip route`, backend port
channels and routes were being displayed. This is due to changes in #13660.
Fix these issues by switching to reading from PORTCHANNEL_MEMBERS table
instead.
Fixes#14459.
* Replace table name with constant
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
#### Why I did it
Implementing code changes for https://github.com/sonic-net/SONiC/pull/1203
#### How I did it
Removed the timers and delayed target since the delayed services would start based on event driven approach.
Cleared port table during config reload and cold reboot scenario.
Modified yang model, init_cfg.json to change has_timer to delayed
#### How to verify it
Running regression
Why I did it
Optimize the version control for Debian packages.
Fix sonic-slave-buster/sources.list.amd64 not found display issue, need to generate the file before running the shell command to evaluate the sonic image tag.
When using the snapshot mirror, it is not necessary to update the version file based on the base image. It will reduce the version dependency issue, when an image is not run when freezing the version.
How I did it
Not to update the version file when snapshot mirror enabled.
How to verify it
Why I did it
refine reproducible build.
How I did it
Fix reset map variable in bash.
Ignore empty web file md5sum value.
If web file didn't backup in azure storage, use file on web.
How to verify i
- Why I did it
To solve an issue with upgrade with fast-reboot including FW upgrade which has been introduced since moving to fast-reboot over warm-reboot infrastructure.
As well, this introduces fast-reboot finalizing logic to determine fast-reboot is done.
- How I did it
Added logic to finalize-warmboot script to handle fast-reboot as well, this makes sense as using fast-reboot over warm-reboot this script will be invoked. The script will clear fast-reboot entry from state-db instead of previous implementation that relied on timer. The timer could expire in some scenarios between fast-reboot finished causing fallback to cold-reboot and possible crashes.
As well this PR updates all services/scripts reading fast-reboot state-db entry to look for the updated value representing fast-reboot is active.
- How to verify it
Run fast-reboot and check that fast-reboot entry exists in state-db right after startup and being cleared as warm-reboot is finalized and not due to a timer.
#### Why I did it
When removing port from LAG while traffic is running thorough LAG there is traffic disruption of 60 seconds.
Fix issue https://github.com/sonic-net/sonic-buildimage/issues/14381
#### How I did it
The patch I added introduces "port_removing" op and call it right before Kernel is asked to remove the port.
Implement the op in LACP runner to disable the port which leads to proper LACPDU send.
#### How to verify it
Set LAG between 2 switches.
Set LAGs to be router port and set ip address.
In switch A send ping to ip address of LAG in switch B.
In switch B, while ping is running remove port from LAG.
Verify ping is not stopping.
#### Why I did it
Fix issue: wrong teamd link watch state after warm reboot due to TEAM_ATTR_PORT_CHANGED lost
The flag TEAM_ATTR_PORT_CHANGED is maintained by kernel team driver:
- a flag "changed" is maintained in struct team_port struct
- the flag is set by __team_port_change_send once relevant information is updated, including port linkup (together with speed, duplex), adding or removing
- the flag is cleared by team_nl_fill_one_port_get once the updated information has been notified to user space via RTNL
In the userspace, the change flag is maintained by libteam in struct team_port.
The team daemon calls port_priv_change_handler_func on receiving port change event.
The logic in port_priv_change_handler_func
1. creates the port if it did not exist, which triggers port add event and eventually calls lacp_port_added callback.
2. triggers port change event if team_port->changed is true, which eventually calls lw_ethtool_event_watch_port_changed to update port state for link watch ethtool.
3. removes the port if team_port->removed is removed
In lacp_port_added, it calls team_refresh to refresh ifinfo, port info, and option info from the kernel via RTNL.
In this step, port_priv_change_handler_func is called recursively.
- In the inner call, it won't get TEAM_ATTR_PORT_CHANGED flag because kernel has already notified that.
- As a result, team_port->changed flag is cleared in the libteam.
- The port change event won't be triggered from either inner or outer call of port_priv_change_handler_func.
If the port has been up when the port is being added to the team device, the "port up" information is carried in the outer call but will be lost.
In case the flag TEAM_ATTR_PORT_CHANGED is set only in the inner call, function port_priv_change_handler_func can be called in the inner call.
However, it will fail to fetch "enable" options because option_list_init has not be called.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
#### How I did it
Fix:
Do not call check_call_change_handlers when parsing RTNL function is called from another check_call_change_handlers recursively.
#### How to verify it
- Manually test
- Regression test
- warm reboot
- warm reboot sad lag
- warm reboot sad lag member
- warm reboot sad (partial)
#### Why I did it
Fixes#14277.
Fixes the inconsistent fallback behaviour for RADIUS authentication when AAA authentication is configured as "radius, local".
#### How I did it
Modified common-auth-sonic.j2 template to make sure that when no RADIUS servers are configured (with AAA authentication login method set to radius, local), the system falls back to local authentication successfully.
#### How to verify it
1. Configure authentication based on RADIUS and local.
config aaa authentication login radius local
2. Configure an unreachable RADIUS server.
config radius add 6.6.6.6
3. Try to login to switch with existing admin user credentials. This is successful.
4. Remove RADIUS server configuration.
config radius delete 6.6.6.6
5. Try to login to switch with admin user credentials. This is successful.
Based on the port breakout HLD, we are now using subport instead of channel in the CONFIG_DB PORT table to handle port breakout. The yang schema needs to be modified accordingly to handle the corresponding change.
The corresponding code changes have been merged through sonic-net/sonic-platform-daemons/pull/342 merged
Signed-off-by: Mihir Patel <patelmi@microsoft.com>
Why I did it
Support Egress Mirroring on supported Arista platforms
How I did it
Add necessary soc properties for egress mirroring recycle ports to be created
Signed-off-by: Nathan Wolfe <nwolfe@arista.com>
Change references to use bullseye instead of buster
Why I did it
Almost all daemons in 202211 and master uses bullseye, and NAT seems easy to migrate.
How I did it
Replaced the references, built with 202211 branch.
How to verify it
Not sure, it builds and tests pass as far as I can tell but I don't use the feature myself.
Signed-off-by: Christian Svensson <blue@cmd.nu>