#### Why I did it
Reset flex counters delay flag on config DB when enable_counters script is called to allow enablement of flex counters in orchagent.
#### How I did it
Push to config DB 'false' value for delay indication when enable_counters script is called before enabling the counters.
#### How to verify it
Observe counters are created when enable_counters script is called.
88a38f7 Ignore ALREADY_EXIST error in FDB creation (#1815)
b1c23f3 Change rif_rates.lua and port_rates.lua scripts to calculate rates correct (#1848)
Update sonic-utilities submodule with
cbc25d6 [config reload] Call systemctl reset-failed for snmp,telemetry,mgmt-framework services (#1773)
04dcd07 Improve config error handling on version_info (#1760)
e567a60 Load the database global_db. (#1752)
c15fb8f [sfputil] Gracefully handle improper 'specification_compliance' field (#1741)
39350f8 [dhcp_relay] Update CLI reference document and add a new API for ip address type (#1717)
18f13c6 [sonic-package-manager] switch from poetry-semver to semantic_version due to bugs found in poetry-semver (#1710)
b16724a [voq][chassis] VOQ cli show commands implementation (#1689)
9427cd6 [debug dump util] Match Infrastructure (#1666)
d9fb39b [route_check] Filter out VNET routes (#1612)
The unit test failure was due to missing bgp graceful restart select
defer time configuration in voq_chassis.conf. Modified sample output
data file voq_chassis.conf to include this configuration.
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
with state of tdport from previous warm-reboot.
In case LAG was down before reboot, lacp->wr is not cleared.
In lacp_event_watch_port_flush_data we incremented nr_of_tdports and add
tdport to lacp->wr.state. In case lacp->wr.state already had this tdport
we do not set new state for tdport but appened a new item in
lacp->wr.state. In case we preformed warm-reboot and PortChannel member
was down, after reboot PortChannel member became up next warm-reboot
will initialize teamd with PortChannel member in down state.
Fix this issue by calling stop_wr_mode() when LAG was down. This was probably intended but missed.
#### Why I did it
To fix an issue seen in warm-reboot-sad test cases.
#### How I did it
I fixed it in SONiC libteam patch that adds warm-reboot support. Details in commit description.
#### How to verify it
Run warm-reboot-sad test on t0-56 topology.
Fix for sonic-cfggen exception during platform string read during fresh install and start of sonic in multi asic, /var/run/redisX/ is created after database docker is started.
#### Why I did it
hostcfgd is starting at the same time as 'create_switch' method is called on orchagent process.
This introduce a degradation on the function execution time which eventually cause the fast-boot flow and a boot scenarion in general to run slower (~6 seconds).
This change will delay the start time of this daemon.
The aaastatsd will delay as well since it has a dependency on hostcfgd, so it is required to delay both.
90 seconds determined as the maximum allowed downtime for control plane to come back up on fast-boot flow.
#### How I did it
Add two timers for hostcfgd and aaastatsd services in order to delay the startup of these services.
#### How to verify it
Install an image with this change and observe the daemons start 90 seconds after the system boot.
enable automated test suites to selectively run relevant tests ( or not run tests ) based upon a new port_type identifier in hwsku.json
How I did it
Modified the valid optional fields in validity check for hwsku.json per recommendation from Joe in
https://github.com/Azure/sonic-mgmt/pull/2654/files
Co-authored-by: Carl Keene <keene@nokia.com>
Signed-off-by: Rajkumar Pennadam Ramamoorthy rpennadamram@marvell.com
Why I did it
Install sonic image from ONIE. Once system is up, execute "config reload" command.
Root cause is that "determine-reboot-cause.service" was in failed state.
root@sonic:/host/reboot-cause# systemctl list-units --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● determine-reboot-cause.service loaded failed failed Reboot cause determination service
How I did it
Fixed the issue by setting default reason to "REBOOT_CAUSE_UNKNOWN" instead of "None".
How to verify it
Check " determine-reboot-cause.service' loaded successfully post image installation from ONIE.
Verify "reboot-cause.txt" file is created and config reload succeeds.
1d3a810 [python coverage] fix result color bar (#202)
3f7b359 Add a template function that returns list of asics on module (#185)
abc2709 Fix decode error when parsing EEPROM fields (#199)
789b41e Load interval from thermal_policy.json (#178)
540ed1c Fix Xcvrd crash due to invalid key access in type_of_media_interface, host_electrical_interface, connector_dict (#206)
716caf8 Unifying the platform api for get_pcie_aer_stats with PcieBase (#197)
Update sonic-utilities with following commit
3f3974e [show priority-group drop counters] Add user info output when user want to check PG counters and polling are disabled (#1678)
16606de Global and Interface commands for IPv6 Link local address enhancements (#1159)
0dcb2b6 Open record file in append mode (#1845)
03ce2ee [vnet/vxlan] Add support of multiple mappers for the VxLAN tunnel (#1843)
c5e90ab VOQ: Nexthop for remote VOQ LC should be created on inband OIF. (#1823)
834c5c8 Td2: Reclaim buffer from unused ports (#1830)
a5ad55c [Dynamic Buffer Calc] Bug fix: Don't create lossless buffer profile for active ports without speed configured (#1822)
f50368f [cfgmgr] Update Makefile.am to consume lib zmq (#1865)
* Update default cable len to 0m for TD2 (#8298)
* Update sonic-cfggen tests with the correct cable len
Signed-off-by: Neetha John <nejo@microsoft.com>
As part of the buffer reclamation efforts for TD2, setting the default cable len to 0m which means unused ports will have a cable len of 0m.
Why I did it
To align with the changes in Azure/sonic-swss#1830
How to verify it
- With the default cable len set to 0m and the associated changes in swss, CABLE_LENGTH table had '0m' set for unused ports and accordingly more space was reserved for the shared pool
- Cfggen tests passed with the cable len update
I have been seeing intermittent (~40%) build failures with the same error described in PR https://github.com/Azure/sonic-buildimage/pull/6592, even with that fix present
```
/usr/bin/ld: mibgroup/ip-forward-mib/ipCidrRouteTable/.libs/ipCidrRouteTable_interface.o: file not recognized: file truncated
...
libtool: error: 'mibgroup/ip-forward-mib/inetCidrRouteTable/inetCidrRouteTable_interface.lo' is not a valid libtool object
make[5]: *** [Makefile:1020: libnetsnmpmibs.la] Error 1
make[5]: *** Waiting for unfinished jobs....
```
#### How I did it
Use `-j1` for the libsnmp build regardless of the value of `$(MULTIARCH_QEMU_ENVIRON)`
#### How to verify it
Performed 10 builds of the libsnmp target (`target/debs/buster/libsnmp-base_5.7.3+dfsg-5_all.deb`) with and without this change. Without the change, hit the error 40% of the time. With the change did not see the error at all
Signed-off-by: Justin Sherman <jusherma@cisco.com>
It can be that service is not enabled but UnitFilePreset=enabled (case
for Application Extension):
```
Loaded: loaded (/lib/systemd/system/cpu-report.service; disabled; vendor preset: enabled)
```
This makes existing logic skip enabling the service.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
e385212 [MPLS] Minor tweaks to VS for MPLS support for CRM polling of MPLS In-segments and NHs.
c163238 Add cisco-8000 checks to syncd_init_common (#839)
To include:
> 66e7817 2021-07-13 [pcied] Fix pcied failure to load due to 'pcied NameError: name 'self' is not defined' (Azure/sonic-platform-daemons#198)
> 3df6757 2021-07-08 [ci] fix result color bar in the code coverage report (Azure/sonic-platform-daemons#196)
2d2749a [xcvrd] add debug logs for y_cable change events/probes (#195)
b2c6102 Collect asic info and store in CHASSIS_STATE_DB (#175)
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
#### Why I did it
Update submodule pointer for swss to include recent changes
4f1d726 [portsorch] fix errors when moving port from one lag to another. (#1797)
ae44701 [orchagent] Put port configuration to APPL_DB according to autoneg mode (#1769)
5295f91 Add failure handling for SAI get operations (#1768)
7c7c451 Revert recirc port change (#1813)
5528ebf Cleanup code (#1814)
8b149a3 Load the database global_db only once for show cli (#1712)
cd0e560 [config][interface][speed] Fixed the config interface speed in multiasic issue (#1739)
b595ba6 [fast-reboot] revert the change of disabling counter polling before fast-reboot (#1744)
8518820 [minigraph] Donot enable PFC watchdog for MgmtTsToR (#1734)
2213774 [CLI][show][bgp] Fix the show ip bgp network command (#1733)
3526507 [configlet] Python3 compatible syntax for extracting a key from the dict (#1721)
5b56b97 [sonic_installer] don't print errors when installing an image not supporting app ext (#1719)
a581955 [LLDP] Fix lldpshow script to enable display multiple MAC addresses on the same remote physical interface (#1657)
As per HLD - Azure/SONiC#625
FRR Patches:
0009-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch
Files modified : bgpd_network.c and bgpd/bgp_zebra.c
Fix for : Link local scope was not set while binding socket with local address causing socket errors for bgp ipv6 link local neighbors.
0010-VRF-interface-lookup-was-still-done-in-the-default-vrf.patch
Files modified : staticd/static_zebra.c
Fix for : VRF interface lookup was still done in the default-vrf which was causing the interface lookup to fail. Due to this static-route pointing to link-local was not getting installed.
0011-Changes-to-send-ipv6-link-local-address-as-nexthop-to-fpmsyncd.patch
Files modified : zebra/zebra_fpm_netlink.c
Fix for : Made changes to send ipv6 address as nexthop to fpmsyncd.
Depends on:
Azure/sonic-utilities#1159Azure/sonic-swss#1463
Signed-off-by: Akhilesh Samineni akhilesh.samineni@broadcom.com
Why I did it
In the config_db.json generated by minigraph "admin_status" attribute is missing for the VOQ inband interface port in the PORT table.
How I did it
Changes done to add admin_status attribute for voq inband interface port, if it exists in the PORT table keys.
For multiasic, the back end asics use ip addresss of Loopback4096 for BGP router id. In VOQ multi-asic chassis there are no back end asics. All the asics are front end and the iBGP connections are established via Ethernet-IB of asics. Since these asics are not designated as BackEnd, the ip address of interface Loopback0 is used as BGP router id. Since the ip address of Loopback0 is same for all the asics in the line card, same router id is used for voq iBGP configurations and hence the iBGP connections are not established. Changes are done to fix this
Why I did it
Currently SONiC use the 'isc-dhcp-relay' package to allow DHCP relay functionality on IPv4 networks only.
This will allow the IPv6 functionality along the IPv4 type.
How I did it
Edit supervisord template to start DHCPv6 instances when configured to do so on Config DB.
Align cfg unit test to the new change.
Add DHCPv6 relay minigraph parsing support and a suitable t0 topology xml file for UT.
How to verify it
Configure DHCPv6 agents as described on the feature HLD: Azure/SONiC#765
Test it with real client/server with IPv6 or use the dedicated automatic test: Azure/sonic-mgmt#3565
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
* Split docker-dhcp-relay.supervisord.conf.j2 template into several files for easier code maintenance
Why I did it
Allow deploying DHCPv6 servers following the implementation PR: #7772
How I did it
Add DHCPv6 to minigraph.py on sonic-cfggen tool and improve the unit test to cover this change.
How to verify it
Try to deploy a switch with DHCPv6 servers.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
#### Why I did it
Enhance DHCP monitor application following the implementation PR: https://github.com/Azure/sonic-buildimage/pull/7772
#### How I did it
Add the support for monitoring DHCPv6 packets.
#### How to verify it
Install an image with this PR and the implementation PR.
- Why I did it
Currently dhcp packets are disabled by the COPP manager for non ToRRouter type switches.
Even if the feature is enabled, DHCP packets wont hook to the CPU since the COPP manager will not trap this packets.
This change is to disable dhcp_relay by default for non ToRRouter switches from init_cfg.json.
With this approach, if the user want to enable the feature for non ToRRouter switches, manual enablement is required by the 'feature' configuration.
This is to keep the current approach for MSFT production issue with dhcp relay for non ToRRouter switched and allow the user to decide if to use it or not.
- How I did it
Configure dhcp_relay 'disabled' by default on init_cfg.json for non ToRRouter switches.
Remove the exclusion of dhcp packets on copp_cfg.json
- How to verify it
Enable dhcp_relay feature on a non ToRRouter switch.
Unit-tests modified so the default values on mocked CONFIG DB in 'test_vectors.py' for dhcp_relay will be 'disabled'.
This is by the change for 'init_cfg.json.j2'.
For ToRRouter the state will change from 'disabled' to 'enabled'.
Another test case added for a 'ToR' switch type, this is to test the state is 'enabled' if the user configured it to be so.
Why I did it
Currently hostcfgd is implemented in a way each feature which is enabled/disabled triggering execution of systemctl enable/unmask commands which eventually trigger 'systemctl daemon-reload' command.
Each call like this cost 0.6s and overall add a overhead of ~12 seconds of CPU time.
This change will verify the desired state of a feature and the current state of this feature on systemd and trigger a system call only when must.
How I did it
Check each feature status on systemd before executing a system call to enable and reload the systemctl daemon.
How to verify it
Build an image with this change and observe less system calls are executed.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
Why I did it
Static route configuration should not depend on BGP_ASN. Remove the dependency on BGP_ASN for StaticRouteMgr.
Fix#8027
How I did it
Check if BGP_ASN field before configuring static route redistribution and wait until BGP_ASN is available to enable static route redistribution.
How to verify it
Add unit test to cover the scenario and verify the functionality on a virtual switch.
Why I did it
systemd-sonic-generator limits multi-asic unit file instances to 10 (single digit instance number 0 - 10). This limitation needs to be removed to handle more than 10 asics.
MAX_NUM_TARGETS and MAX_NUM_INSTALL_LINES limits to 15 which is not sufficient for systems with more than 15 asics.
Inside get_unit_files(), strcmp produce incorrect results due to non null terminated string being compared.
Added build UT support for systemd-sonic-generator
Updates:
888701b [Mellanox] Remove mstdump from Mellanoxs collect dump script ([Azure/sonic-utilities#1706])
4818360 [sonic-package-manager] support warm/fast reboot for extension packages ([Azure/sonic-utilities#1554])
793b847 [show priority-group drop counters] Remove backup with cached PG drop counters after 'config reload' ([Azure/sonic-utilities#1679])
24fe1ac [show][config] support for interface alias for muxcable commands ([Azure/sonic-utilities#1699])
186d8513 Pcieutil to load the platform api first instead of using common api (#1672)
7a82c069 [Mellanox] Update mellanox dump generation to include SDK dumps (#1640)
38f8c068 [sfputil] Expose error status fetched from STATE_DB or platform API to CLI (#1658)
c5d00ae4 [pfcwd] Fix the return code in invalid case (#1691)
57dc4032 [ci]: Fix config prompt question issue (#1693)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Before this change, a process running inside every SONiC container dealt with FEATURE table 'auto_restart' field and depending on the value decided whether a container has to be killed or not.
If killed service auto restart mechanism restarts the container.
This change moves the logic from container to the host daemon - hostcfgd.
The 'auto_restart' handling is kept in supervisor-proc-exit-listener but now it is not required for container that wants to support auto restart feature.
hostcfgd refactoring - move feature handling in another class.
override systemd service Restart= setting from hostcfgd.
remove default systemd Restart=always.
Signed-off-by: Stepan Blyshchak stepanb@nvidia.com
- Why I did it
Remove the need to deal with container orchestration logic from the container itself. Leave this logic to the orchestrator - host OS.
- How I did it
hostcfgd configures 'Restart=' value for systemd service.
- How to verify it
root@r-tigon-11:/home/admin# sudo config feature autorestart lldp enabled
root@r-tigon-11:/home/admin# show feature status | grep lldp
lldp enabled enabled
root@r-tigon-11:/home/admin# docker exec -it lldp pkill -9 lldpd
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Exited (0) 20 seconds ago lldp
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Up 5 seconds lldp
root@r-tigon-11:/home/admin# sudo config feature autorestart lldp disabled
root@r-tigon-11:/home/admin# docker exec -it lldp pkill -9 lldpd
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Up 35 seconds lldp
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Exited (0) 3 seconds ago lldp
root@r-tigon-11:/home/admin# docker ps -a | grep lldp
65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Exited (0) 39 seconds ago lldp
root@r-tigon-11:/home/admin#
Advance submodule head for sonic-swss
32261636 [BufferOrch] Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL (Azure/sonic-swss#1786)
6c88e47a [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking (Azure/sonic-swss#1781)
e86b900d [MPLS] sonic-swss changes for MPLS (Azure/sonic-swss#1686)
4c8e2b53 [Dynamic Buffer Calc] Avoid creating lossy PG for admin down ports during initialization (Azure/sonic-swss#1776)
36021246 [VS test stability] Skip flaky test for DPB (Azure/sonic-swss#1807)
c37cc1c5 Support for in-band-mgmt via management VRF (Azure/sonic-swss#1726)
1e3a532d Fix config prompt question issue (Azure/sonic-swss#1799)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
A recent version of contextlib2 (https://pypi.org/project/contextlib2/21.6.0/#history) has broken Python2 compatibility, so the version picked up by netaddr when using Python2 must be specified, or else builds fail
Co-authored-by: Tom Zhu <tom.zhu@metaswitch.com>