- Why I did it
The field 'subport' represents the index of the split port within a physical port. For example, if a port is split into 4, the subport of the first logical port is 1, the subport of the second logical port is 2, and so on.
In xcvrd, the CMIS manager uses the subport to calculate the lane mask, which is used to control the data path per lane. In Nvidia platform, the subport is missing and is always set to 0. According to the xcvrd code, when subport=0, it will always correspond to the first logical port. Therefore, if we shut down any logical port that is not the first one, we will see the operational status of the first logical port also becomes down.
This PR aims to add the subport field to CONFIG DB and prevent such scenarios. This is applicable only for static default breakout mode. For DPB, subport calculation will happen on the fly (changes are not in Sonic yet).
(Subport HLD: HLD of subport: [link to the HLD document])
- How I did it
I have added the 'subport' field to all relevant Nvidia hwsku.json files (minigraph generation is based on them). Additionally, I introduced the new 'subport' field to portconfig.py, so that sonic-cfggen will be able to generate the minigraph with it. In this file, I also fixed an error that caused all attributes from hwsku.json to be applied only to the first logical ports associated with a physical port.
Furthermore, I updated hwsku_json_checker to include the new field and applied a fix to the sample_hwsku.json file. sample_hwsku.json is the file that sonic-config-engine's unit tests rely on for its tests. Previously, it only included attributes for the first logical port of a split physical port. For example, if Ethernet4, a 4-lane port, was split into 2 ports, then sample_hwsku.json included only the entry for Ethernet4, with no entry for Ethernet6. This misalignment with the structure of other hwsku.json files has been corrected as well.
- How to verify it
Ensure that each logical port has the correct value of 'subport' in CONFIG DB, and that shutting down a logical port affects only that port and not other ports in the split.
Fix IPV6 forced-mgmt-route not work issue
Why I did it
IPV6 forced-mgmt-route not work
When add a IPV6 route, should use 'ip -6 rule add pref 32764 address' command, but currently in the template the '-6' parameter are missing, so the IPV6 route been add to IPV4 route table.
Also this PR depends on #17281 , which will fix the IPV6 'default' route table missing in IPV6 route lookup issue.
Microsoft ADO (number only):24719238
#### Why I did it
SNMP query over IPv6 does not work due to issue in net-snmp where IPv6 query does not work on multi-nic environment.
To get around this, if snmpd listens on specific ipv4 or ipv6 address, then the issue is not seen.
We plan to configure Management IP and Loopback IP configured in minigraph.xml as SNMP_AGENT_ADDRESS in config_db., based on changes discussed in https://github.com/sonic-net/SONiC/pull/1457.
##### Work item tracking
- Microsoft ADO **(number only)**:26091228
#### How I did it
Modify minigraph parser to update SNMP_AGENT_ADDRESS_CONFIG with management and Loopback0 IP addresses.
Modify snmpd.conf.j2 to use SNMP_AGENT_ADDRESS_CONFIG table if it is present in config_db, if not listen on any IP.
Main change:
1. if minigraph.xml is used to configure the device, then snmpd will listen on mgmt and loopback IP addresses,
2. if config_db is used to configure the device, snmpd will listen IP present in SNMP_AGENT_ADDRESS_CONFIG if that table is present, if table is not present snmpd will listen on any IP.
#### How to verify it
config_db.json created from minigraph.xml for single asic VS image with mgmt and Loopback IP addresses.
```
"SNMP_AGENT_ADDRESS_CONFIG": {
"10.1.0.32|161|": {},
"10.250.0.101|161|": {},
"FC00:1::32|161|": {},
"fec0::ffff:afa:1|161|": {}
},
.....
snmpd listening on the above IP addresses:
admin@vlab-01:~$ sudo netstat -tulnp | grep 161
tcp 0 0 127.0.0.1:3161 0.0.0.0:* LISTEN 71522/snmpd
udp 0 0 10.250.0.101:161 0.0.0.0:* 71522/snmpd
udp 0 0 10.1.0.32:161 0.0.0.0:* 71522/snmpd
udp6 0 0 fec0::ffff:afa:1:161 :::* 71522/snmpd
udp6 0 0 fc00:1::32:161 :::* 71522/snmpd
```
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.
#### Why I did it
When device set with IPV6 TACACS server address, and shutdown all BGP, device can't connect to TACACS server via management interface.
After investigation, I found the IPV6 'default' route table does not add to route lookup:
admin@vlab-01:~$ ip -6 rule list
1001: from all lookup local
32765: from fec0::ffff:afa:1 lookup default
32766: from all lookup main
admin@vlab-01:~$
As compare:
admin@vlab-01:~$ ip -4 rule list
1001: from all lookup local
32764: from all to 172.17.0.1/24 lookup default
32765: from 10.250.0.101 lookup default
32766: from all lookup main
32767: from all lookup default <== 'default' route table exist in IPV4 route lookup
Issue fix by add 'default' route table to route lookup with following command:
admin@vlab-01:~$ sudo ip -6 rule add pref 32767 lookup default
admin@vlab-01:~$ ip -6 rule list
1001: from all lookup local
32765: from fec0::ffff:afa:1 lookup default
32766: from all lookup main
32767: from all lookup default <== 'default' route table been added to IPV6 route lookup
admin@vlab-01:~$
##### Work item tracking
- Microsoft ADO: 25798732
#### How I did it
When management interface using 'default' route table, add 'default' route table to IPV6 route lookup.
#### How to verify it
Pass all UT.
Add new UT to cover this change.
Manually verify issue fixed:
### Tested branch (Please provide the tested image version)
- [x] master-17281.417570-2133d58fa
#### Description for the changelog
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.
Signed-off-by: Nazarii Hnydyn nazariig@nvidia.comCloses#17345
This W/A was proposed by Nvidia FRR team before the long term solution is ready.
Why I did it
A W/A to fix default route installation during LAG member flap
Work item tracking
N/A
How I did it
Disabled FRR next hop group support
How to verify it
Do LAG member flap
Modify j2 template files in docker-dhcp-relay. Add dhcprelayd to group dhcp-relay instead of isc-dhcp-relay-VlanXXX, which would make dhcprelayd to become critical process.
In dhcprelayd, subscribe FEATURE table to check whether dhcp_server feature is enabled.
2.1 If dhcp_server feature is disabled, means we need original dhcp_relay functionality, dhcprelayd would do nothing. Because dhcrelay/dhcpmon configuration is generated in supervisord configuration, they will automatically run.
2.2 If dhcp_server feature is enabled, dhcprelayd will stop dhcpmon/dhcrelay processes started by supervisord and subscribe dhcp_server related tables in config_db to start dhcpmon/dhcrelay processes.
2.3 While dhcprelayd running, it will regularly check feature status (by default per 5s) and would encounter below 4 state change about dhcp_server feature:
A) disabled -> enabled
In this scenario, dhcprelayd will subscribe dhcp_server related tables and stop dhcpmon/dhcrelay processes started by supervisord and start new pair of dhcpmon/dhcrelay processes. After this, dhcpmon/dhcrelay processes are totally managed by dhcprelayd.
B) enabled -> enabled
In this scenaro, dhcprelayd will monitor db changes in dhcp_server related tables to determine whether to restart dhcpmon/dhrelay processes.
C) enabled -> disabled
In this scenario, dhcprelayd would unsubscribe dhcp_server related tables and kill dhcpmon/dhcrelay processes started by itself. And then dhcprelayd will start dhcpmon/dhcrelay processes via supervisorctl.
D) disabled -> disabled
dhcprelayd will check whether dhcrelay processes running status consistent with supervisord configuration file. If they are not consistent, dhcprelayd will kill itself, then dhcp_relay container will stop because dhcprelayd is critical process.
This is change taken as part of the HLD: sonic-net/SONiC#1470.
In this PR we add the logic to parse the SecondarySubnets field in the minigraph and add a flag in "secondary" in the vlan_interface table of the config db.
Microsoft ADO (number only): 16784946
How I did it
Made changes in the minigraph.py to parse the xml entry and add the parsed value to the config db
How to verify it
Added python tests in the sonic-config-engine folder to test the config db entries.
This is change taken as part of the HLD: sonic-net/SONiC#1470 and this is a follow up on the PR #16827 where in the docker-dhcp we pick the value of primary gateway of the interface from the VLAN_Interface table which has "secondary" flag set in the config_db
Microsoft ADO (number only): 16784946
How did I do it
- Changes in the j2 file to add a new "-pg" parameter in the dhcpv4-relay.agents.j2, the ip would be retrieved from the config db's vlan_interface table such that the interface which are picked will have secondary field set.
- Changes in isc-dhcp to re-order the addresses of the discovered interface and which has the ip which has the passed parameter.
- Why I did it
PR checker is blocked by container_checker.
- How I did it
Disable telemetry in minigraph parser.
- How to verify it
Run pipeline and sanity check.
Why I did it
The current DEVICE_NEIGHBOR_METADATA yang model has two issues that would block GCU operation when it checks if the current config aligns with the YANG model:
Missing cluster field in YANG
Incomplete set of device type. The device type in YANG model doesn't include all the device type.
Work item tracking
Microsoft ADO (number only): 25577813
How I did it
Add cluster field in DEVICE_NEIGHBOR_METADATA YANG model.
Change device type to string.
Fix the UT test accordingly.
How to verify it
Build the image and verify the unit tests passed.
Why I did it
Enable the suppress fib feature by default.
Work item tracking
Microsoft ADO (25564723):
How I did it
In minigraph.py, to add the field suppress-fib-pending, and enable it for leafrouter.
How to verify it
Build / load image and check the config_db by show CLI.
admin@str-7260cx3-acs-2:~$ show suppress-fib-pending
Enabled
Need to modify the tests/bgp/test_bgp_suppress_fib.py in sonic-mgmt repo, to check the config before restore. Otherwise, after this test, it will turn off the suppress-fib-pending.
sonic-net/sonic-mgmt#10612
Why I did it
To avoid orchagent crash issue like sonic-net/sonic-swss#2935, disable unsupported counters on SONiC management devices.
Work item tracking
Microsoft ADO (number only): 25437720
How I did it
Update the minigraph parser to disable unsupported counters on management devices.
How to verify it
Verified by unittest.
Manually apply patch to DUT and do config load_minigraph
Microsoft ADO (25266920)
sonic-mgmt xoff test was failing for [100g,120km]. Needed to update total headroom pool size when 100G line card is used as T2 uplink.
This size was calculated assuming 100g is used for downlink so cable length was 2km whereas it can also be used for uplink (cable length - 120km). so we need to do calculation based on 120km not 2km. Although it will be some wastage for 2km scenario but it should cover both cases.
In #15080, there was a command added to re-add 127.0.0.1/8 to the lo
interface when the networking configuration is being brought down.
However, the trigger for that command is `down`, which, looking at
ifupdown2 configuration files, runs immediately after 127.0.0.1/16 is
removed. This means there may be a period of time where there are no
loopback addresses assigned to the lo interface, and redis commands will
fail.
Fix this by changing this to pre-down, which should run well before
127.0.0.1/16 is removed, and should always leave lo with a loopback
address.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Change the CAK key length check in config plugin, macsec test profile changes
* Fix the format in add_profile api
The changes needed in various macsec unit tests and config plugin when we move to accept the type 7 encoded key format for macsec. This goes along with PR : sonic-net/sonic-swss#2892 raised earlier.
How I did it
Fix the regex for L4 port range in openconfig_acl.py.
How to verify it
Build image and install on Arista-720DT DUT, then try the repro steps in #16189 and confirmed the ACL rule be setup correctly:
Why I did it
According to ACL-Table-Type-HLD, the value type of MATCHES, ACTIONS and BIND_POINTS should be list instead of string. Opening this PR to update the definition of BMCDATA and BMCDATAV6.
How I did it
Update the definition of BMCDATA and BMCDATAV6 in minigraph-parser.
How to verify it
Verified by UT and build SONiC image.
Why I did it
To overwrite the default DSCP_TO_TC_MAP for tunnel traffic, the attribute sai_tunnel_support must be set to 1.
Before this change, the attribute is set only on dual-tor platform when remap is enabled.
This PR is to set the attribute on all Arista-7260cx3 devices.
Work item tracking
Microsoft ADO 24785776
How I did it
Update the config.bcm template for Arista-7260cx3 devices.
How to verify it
The change is verified by manually rendering the j2 on a T1 testbed.
8111 800G interface, split to 2x400G (each has 4 lanes) fails to change interface speed from 400G to 100G during deploy mg. In minigraph.xml, the interface speed configuration is good, but fails to generate the right value to config_db.json.
In order to support this SKU the speed transitioning should support both 4 lanes and 8 lanes in the port_config.ini.
Why I did it
before this change for a 400G to 100G transition, in all cases except when lanes are 8, we would continue and the line
ports.setdefault(port_name, {})['speed'] = port_speed_png[port_name]
would not be executed, hence the default speed will never be set for a case and config_db will not be updated,
where speed is transitioning from 400G to 100G or 40G, but lanes are not equal to 8.
In order for those cases to pass where lanes are not specifically 8, we need the change
Work item tracking
24242657
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Why I did it
Support default DNS configuration
How I did it
Use j2 template to generate default DNS configuration.
How to verify it
Run sonic-config-engine unit test.
Why I did it
For route registry service, in order to block hijacked routes, IBGP session needs to be set up from BGP sentinel service to SONiC, and BGP sentinel service advertise the same route with higher local-preference and no export community. So that SONiC takes the route from BGP sentinel as the best path and does not advertise the route to EBGP peers.
In order to do that, new route-maps are needed. So this change adds a new set of templates, keeping BGPSentinel peers out of the other templates.
Work item tracking
Microsoft ADO (number only): 24451346
How I did it
Add sentinel_community in constants.yml, route from BGPSentinel do not match this community will be denied.
Add support to convert BGPSentinel related configuration in the BGPPeerPassive element of the minigraph to a new BGP_SENTINELS table in CONFIG_DB
Add a new set of "sentinels" templates to docker-fpm-frr
Add a new BGP peer manager to bgpcfgd, to add neighbors from the BGP_SENTINELS table using the "sentinels" templates
Add a test case for minigraph.py, making sure the BGPSentinel and BGPSentinelV6 elements create BGP_SENTINELS DB entry.
Add a set of test cases for the new sentinels templates in sonic-bgpcfgd tests.
Add sonic-bgp-sentinel.yang and a set of testcases for the yang file.
How to verify it
Testcases and UT newly added would pass.
Setup IPv4 and IPv6 BGPSentinel services in minigraph, and load minigraph, show CONFIG_DB and "show runningconfig bgp", configuration would be loaded successfully.
Using t1-lag topo and setup IBGP session from BGPSentinel to SONiC loopback address, IBGP session would up.
Advertise route from BGPSentinel to T1 with sentinel_community, higher local-preference and no-export communiyt. In T1, show bgp route, the result is "Not advertise to any EBGP peer".
Withdraw the route in BGPSentinel, in T1, route would advertise to EBGP peers.
Advertise route from T1 that does not match sentinel_community, in T1, would not see the route in show bgp route.
#### Why I did it
We should not modify minigraph schema.
#### How I did it
Update minigraph.py and remove unit test.
#### How to verify it
Run sonic-config-engine unit test.
*What I did:
Enable BFD for Static Route for chassis-packet. This will trigger the use of the feature as defined in here: #13789
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
- Why I did it
Add support for static DNS configuration. According to sonic-net/SONiC#1262 HLD.
- How I did it
Add a new resolv-config.service that is responsible for transferring configuration from Config DB into /etc/resolv.conf file that is consumed by various subsystems in Linux to resolve domain names into IP addresses.
- How to verify it
Run the image compilation. Each component related to the static DNS feature is covered with the unit tests.
Run sonic-mgmt tests. Static DNS feature will be covered with the system tests.
Install the image and run manual tests.
Why I did it
Update the definition of acl table type BMCDATA and BMCDATAV6 in minigraph parser.
Work item tracking
Microsoft ADO (number only): 24101023
How I did it
Update the definition of acl table type BMCDATA and BMCDATAV6 in minigraph parser.
How to verify it
Ran unittest to verify this update:
Added support data for fabric monitoring in CONFIG_DB
The CONFIG_DB now has the FABRIC_MONITOR|FABRIC_MONITOR_DATA table for default value for fabric port monitoring. An example output of getting this table is:
sonic-db-cli CONFIG_DB hgetall "FABRIC_MONITOR|FABRIC_MONITOR_DATA"
{'monErrThreshCrcCells': '1', 'monErrThreshRxCells': '61035156', 'monPollThreshIsolation': '1', 'monPollThreshRecovery': '8'}
The CONFIG_DB now also has a table for each fabric port for its isolate status.
An example output of getting this table is:
sonic-db-cli CONFIG_DB hgetall "FABRIC_PORT|Fabric20"
{'alias': 'Fabric20', 'isolateStatus': 'False', 'lanes': '20'}
* Re-add 127.0.0.1/8 when bringing down the interfaces
With #5353, 127.0.0.1/16 was added to the lo interface, and then
127.0.0.1/8 was removed. However, when bringing down the lo interface,
like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8
isn't added back to the interface. This means that there's a period of
time where 127.0.0.1 is not available at all, and services that need to
connect to 127.0.01 (such as for redis DB) will fail.
To fix this, when going down, add 127.0.0.1/8. Add this address before
the existing configuration gets removed, so that 127.0.0.1 is available
at all times.
Note that running `ifdown lo` doesn't actually bring down the loopback
interface; the interface always stays "physically" up.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* [202205] Update SOC properties for DLR_INIT based pfcwd recovery (#15217)
Why I did it
Update soc properties for certain roles that need to use pfcwd dlr init based recovery mechanism
How to verify it
Updated the templates on a 7050cx3 dual tor and 7260 T1 which satisfies these conditions and validated pfcwd recovery which uses DLR_INIT based mechanism. Also validated that this mechanism is not used on 7050cx3 single tor with the updated templates
Signed-off-by: Neetha John <nejo@microsoft.com>
* AclInterface and Management Interfaces are parsed on finding first valid node for it.
Above logic works for multi-asic scenarios where ACL Interface and Management Interfaces are present in DPG order {Host, Asicx, Asicy} but not when DPG is in {Asicx, Asicy, Host} order.
- Run pre-commit tox profile to trim all trailing blanks
- Use several commits with a per-folder based strategy
to ease their merge
Issue #15114
Signed-off-by: Guillaume Lambert <guillaume.lambert@orange.com>
Why I did it
We need to store information of power shelf in config_db for SONiC MX switch. Current minigraph parser cannot parse rack_mgmt_map field.
Work item tracking
Microsoft ADO (number only): 22179645
How I did it
Add support for parsing rack_mgmt_map.
What I did:
Updated Static Route Attribute in Minigraph. NGS Minigraph has define semantics of static route differently.
See below for differences:-
Microsoft ADO: 17956325
Before
<AssociatedTo>8.0.0.1/32</AssociatedTo>
<Address>192.168.1.2,192.168.2.2</Address>
<AttachTo>PortChannel40,PortChannel50</AttachTo>
Now:
<Address>8.0.0.1</Address>
<AttachTo>PortChannel40,192.168.1.2;PortChannel50,192.168.2.2</AttachTo>
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Update PG headroom settings ports based on port speed/cable length
* Updated XOFF settings to use chip level numbers than core
* Updated PG headroom based on uplink/downlink side
* fix for sonic-config-gen tests
* More fixes for unit test cases
* more test fixes
* Merged multiple functions into one
Following changes are done:
Added Support where if asic configuration is not present in minigraph sonic-cfggen do not error out but instead process it gracefully.
Use Case: In Supervisor we have number of asic are define as max possible but in minigraph configuration of only valid/available asics only are present. Without this change load_minigraph fails.
Microsoft ADO: 17956325
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Why I did it
To support BGPMon sessions from each T2 linecard ASIC
Work item tracking
Microsoft ADO (number only): 17873174
How I did it
Added change in BGPMon configuration to use Loopback4096 as source interface, since this has a unique IP per ASIC.
How to verify it
Tested by manually setting up BGPMon session on T2 LC and verified that Loopback4096 could be used as source
There are chassis-packet and Single asic platforms which support this 400G to 100G/40G speed change via config.
Enabling this feature for all platforms which can support this. Keeping it enabled for all does not affect the platforms
which do not support this feature yet.
Signed-off-by: anamehra anamehra@cisco.com