I have been seeing intermittent (~40%) build failures with the same error described in PR https://github.com/Azure/sonic-buildimage/pull/6592, even with that fix present
```
/usr/bin/ld: mibgroup/ip-forward-mib/ipCidrRouteTable/.libs/ipCidrRouteTable_interface.o: file not recognized: file truncated
...
libtool: error: 'mibgroup/ip-forward-mib/inetCidrRouteTable/inetCidrRouteTable_interface.lo' is not a valid libtool object
make[5]: *** [Makefile:1020: libnetsnmpmibs.la] Error 1
make[5]: *** Waiting for unfinished jobs....
```
#### How I did it
Use `-j1` for the libsnmp build regardless of the value of `$(MULTIARCH_QEMU_ENVIRON)`
#### How to verify it
Performed 10 builds of the libsnmp target (`target/debs/buster/libsnmp-base_5.7.3+dfsg-5_all.deb`) with and without this change. Without the change, hit the error 40% of the time. With the change did not see the error at all
Signed-off-by: Justin Sherman <jusherma@cisco.com>
Why I did it
Update Makefile, so it does the following:
For a given platform, verify if platform/checkout/.ini exists and hence run the platform/checkout/template.j2. This allows platform code to be checked out during the 'make configure' stage.
How I did it
git clone git@github.com:Azure/sonic-buildimage.git
mkdir platform/cisco-8000
make init
make configure PLATFORM=cisco-8000
make all
Why I did it
serial-getty service exited in Dell S6100 device randomly.
How I did it
Added serial-getty to monit services.
How to verify it
Stop serial-getty in ssh session and check whether the service restarts or not.
Why I did it
platform test suite failed for few API's in DellEMC Z9332f platform.
How I did it
Modified the API's to return the expected values in the script.
How to verify it
Run platform test suite after making the changes.
It can be that service is not enabled but UnitFilePreset=enabled (case
for Application Extension):
```
Loaded: loaded (/lib/systemd/system/cpu-report.service; disabled; vendor preset: enabled)
```
This makes existing logic skip enabling the service.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
e385212 [MPLS] Minor tweaks to VS for MPLS support for CRM polling of MPLS In-segments and NHs.
c163238 Add cisco-8000 checks to syncd_init_common (#839)
To include:
> 66e7817 2021-07-13 [pcied] Fix pcied failure to load due to 'pcied NameError: name 'self' is not defined' (Azure/sonic-platform-daemons#198)
> 3df6757 2021-07-08 [ci] fix result color bar in the code coverage report (Azure/sonic-platform-daemons#196)
2d2749a [xcvrd] add debug logs for y_cable change events/probes (#195)
b2c6102 Collect asic info and store in CHASSIS_STATE_DB (#175)
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
#### Why I did it
Update submodule pointer for swss to include recent changes
4f1d726 [portsorch] fix errors when moving port from one lag to another. (#1797)
ae44701 [orchagent] Put port configuration to APPL_DB according to autoneg mode (#1769)
5295f91 Add failure handling for SAI get operations (#1768)
7c7c451 Revert recirc port change (#1813)
5528ebf Cleanup code (#1814)
8b149a3 Load the database global_db only once for show cli (#1712)
cd0e560 [config][interface][speed] Fixed the config interface speed in multiasic issue (#1739)
b595ba6 [fast-reboot] revert the change of disabling counter polling before fast-reboot (#1744)
8518820 [minigraph] Donot enable PFC watchdog for MgmtTsToR (#1734)
2213774 [CLI][show][bgp] Fix the show ip bgp network command (#1733)
3526507 [configlet] Python3 compatible syntax for extracting a key from the dict (#1721)
5b56b97 [sonic_installer] don't print errors when installing an image not supporting app ext (#1719)
a581955 [LLDP] Fix lldpshow script to enable display multiple MAC addresses on the same remote physical interface (#1657)
- Why I did it
To fix failed test cases of Haliburton platform APIs that found on platform_tests script
- How I did it
Add device/celestica/x86_64-cel_e1031-r0/platform.json
Update functions to support python3.7
Add more functions follow latest sonic_platform_base
Fix the bug
- How to verify it
Run platform_tests script
Signed-off-by: Wirut Getbamrung [wgetbumr@celestica.com]
#### Why I did it
I made this change to support warm/fast reboot for SONiC extension packages as per HLD Azure/SONiC#682.
#### How I did it
I extended manifest.json.j2 with new warm/fast reboot related fields and also extended sonic_debian_extension.j2 script template to generate the shutdown order files for warm and fast reboot.
- Why I did it
Make DHCP relay docker an extension. DHCP relay now carries dhcp relay commands CLI plugin and has a complete manifest.
It is installed as extension if INCLUDE_DHCP_REALY is set to y.
DEPENDS on #5939
- How I did it
Modify DHCP relay docker makefile and dockerfile. Make changes to sonic_debian_extension.j2 to install sonic packages.
I moved DHCP related CLI tests from sonic-utilities to DHCP relay docker.
This PR introduces a way to write a plugin as part of docker image and run the tests from cli-plugin-tests directory under docker directory.
The test result is available in target/docker-dhcp-relay.gz.log:
[ REASON ] : target/docker-dhcp-relay.gz does not exist NON-EXISTENT PREREQUISITES: docker-start target/docker-config-engine-buster.gz-load target/python-wheels/sonic_utilities-1.2-py3-none-any.whl-in
stall target/debs/buster/python3-swsscommon_1.0.0_amd64.deb-install
[ FLAGS FILE ] : []
[ FLAGS DEPENDS ] : []
[ FLAGS DIFF ] : []
============================= test session starts ==============================
platform linux -- Python 3.7.3, pytest-3.10.1, py-1.7.0, pluggy-0.8.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /sonic/dockers/docker-dhcp-relay/cli-plugin-tests, inifile:
plugins: cov-2.6.0
collecting ... collected 10 items
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_plugin_registration PASSED [ 10%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_nonexist_vlanid PASSED [ 20%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_invalid_vlanid PASSED [ 30%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_invalid_ip PASSED [ 40%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_exist_ip PASSED [ 50%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_del_dhcp_relay_dest PASSED [ 60%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_remove_nonexist_dhcp_relay_dest PASSED [ 70%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_remove_dhcp_relay_dest_with_nonexist_vlanid PASSED [ 80%]
test_show_dhcp_relay.py::TestVlanDhcpRelay::test_plugin_registration PASSED [ 90%]
test_show_dhcp_relay.py::TestVlanDhcpRelay::test_dhcp_relay_column_output PASSED [100%]
=============================== warnings summary ===============================
/usr/local/lib/python3.7/dist-packages/tabulate.py:7
/usr/local/lib/python3.7/dist-packages/tabulate.py:7: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import namedtuple, Iterable
-- Docs: https://docs.pytest.org/en/latest/warnings.html
==================== 10 passed, 1 warnings in 0.35 seconds =====================
As per HLD - Azure/SONiC#625
FRR Patches:
0009-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch
Files modified : bgpd_network.c and bgpd/bgp_zebra.c
Fix for : Link local scope was not set while binding socket with local address causing socket errors for bgp ipv6 link local neighbors.
0010-VRF-interface-lookup-was-still-done-in-the-default-vrf.patch
Files modified : staticd/static_zebra.c
Fix for : VRF interface lookup was still done in the default-vrf which was causing the interface lookup to fail. Due to this static-route pointing to link-local was not getting installed.
0011-Changes-to-send-ipv6-link-local-address-as-nexthop-to-fpmsyncd.patch
Files modified : zebra/zebra_fpm_netlink.c
Fix for : Made changes to send ipv6 address as nexthop to fpmsyncd.
Depends on:
Azure/sonic-utilities#1159Azure/sonic-swss#1463
Signed-off-by: Akhilesh Samineni akhilesh.samineni@broadcom.com
Move SAI libraries for Broadcom for XGS and DNX families to 5.0.0.6
Included fixes
```
CS00012195263 | [4.3][5.0][TD3] Packets with broken IP headers received on VLAN interface are not dropped
CS00012192505 | [4.3] Re-encap IPinIP decap packets
CS00012192502 | [3.7.5.2] Start LED shell script execution on all DELL based platforms causing all ports flapping on SAI 3.7.5.2
CS00012191363 | [4.3] Support of memscan thread to detect TCAM parity error
CS00012190932 | [4.3] SAI_PORT_PFC_X_RX_PKTS incremented incorrectly even when no PFC frames are received on that priority
CS00012183901 | [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00011382163 | [4.4] Support warm-boot from 3.5 to 4.3
CS00011318937 | [4.3] MACSec SAI Support for Jericho2c+
CS00011318926 | [4.3] Provide SAI support for Jericho2c+
CS00012195263 | [4.3][5.0][TD3] Packets with broken IP headers received on VLAN interface are not dropped
CS00012195261 | [4.3][5.0][TD3]VLAN tagged IP packet received on untagged interface being routed instead of dropped
CS00012183901 | [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00012196056 | [4.3.3.8][WARMBOOT] syncd[2584]: segfault at 5616ad6c3d80 ip 00007f61e0c6bc65 sp 00007fff0c5a7a90 error 4 in libsai.so.1.0[7f61e0a95000+3cd8000]
CS00012195262 | [4.3][5.0][TD3] Malformed IP packet(missing IP header) received on a VLAN Interface is flooded to other LVAN members instead of being dropped
CS00012195956 | [4.3.3.8] [TD3]Syncd Crash at brcm_sai_tnl_mp_create_tunnel()
PR 4346163: Add support for AN/LT
```
Changes to allow starting per asic services like swss and syncd only if the platform vendor codedetects the asic is detected and notified. The systemd services ordering we want is database->database@->pmon->swss@->syncd@->teamd@->lldp@
There is also a requirement that management, telemetry, snmp dockers can start even if all asic services are not up.
Why I did it
For VOQ chassis, the fabric cards will have 1-N asics. Also, there could be multiple removable fabric cards. On the supervisor, swss and syncd containers need to be started only if the fabric-card is in Online state and respective asics are detected by the kernel. Using systemd, the dependent services can be in inactive state.
How I did it
Introduce a mechanism where all ASIC dependent service wait on its state to be published via PMON to REDIS. Once the subscription is received, the service proceeds to create respective dockers.
For fixed platforms, systemd is unchanged i.e. the service bring up and docker creation happens in the start()/ExecStartPre routine of the .sh scripts.
For VOQ chassis platform on supervisor, the service bringup skips docker creation in the start() routine, but does it in the wait()/ExecStart routine of the .sh scrips.
Management dockers are decoupled from ASIC docker creation.
Why I did it
BIOS upgrade on rare cases cannot guarantee bus value remain the same on every BIOS release. Ignoring this field in order for pcied not to fail but still verify device id in a different way. The solution is future proof and will not require changes in code when new BIOS version is available
How I did it
Since bus is not a fixed value (it is determined by the bios version) we are ignoring this field, and instead checking if there is a device that match on all other fields that and in addition has a matching device id.
How to verify it
Verify no errors or failures in pcied on different BIOS version with the same code base.
Why I did it
In the config_db.json generated by minigraph "admin_status" attribute is missing for the VOQ inband interface port in the PORT table.
How I did it
Changes done to add admin_status attribute for voq inband interface port, if it exists in the PORT table keys.
For multiasic, the back end asics use ip addresss of Loopback4096 for BGP router id. In VOQ multi-asic chassis there are no back end asics. All the asics are front end and the iBGP connections are established via Ethernet-IB of asics. Since these asics are not designated as BackEnd, the ip address of interface Loopback0 is used as BGP router id. Since the ip address of Loopback0 is same for all the asics in the line card, same router id is used for voq iBGP configurations and hence the iBGP connections are not established. Changes are done to fix this
- Why I did it
Update SAI version to 1.19.1. The following was changed:
1. Update license
2. Do not remove and re-apply the same SDK mirror session on LAG
3. FEC fix to support all speeds
4. Improve PG counters performance
5. Fix number of switch priorities for port mirroring
Signed-off-by: Dror Prital <drorp@nvidia.com>
Avoid initializing sfp/thermal/components/fan/psu/leds on simx and create vpd_info file on hw_management when we use mellanox simulator platform
- Why I did it
this is a fix for issue in mellanox simulator platforms. the syseepromd failed on the pmon docker. also "decode-syseeprom" failed also
- How I did it
before initializing thermal/components/fan/psu/leds --> check if we are running on simx
creating the vpd_info on the hw_management folder.
- How to verify it
check if syseepromd process was loaded properly on the pmon docker.
decode-syseeprom is working well without errors/warnings
- Why I did it
to prevent python exception error when executing warm-reboot command on mellanox simulator platform
- How I did it
return None on the watchdog python script on cases that watchdog file is not exist
- How to verify it
warm-reboot is running well without the python error. error message will appear on log on these cases.
in order to avoid this error message we can simulate the watchdog on mellanox simulator platform
Why I did it
Update XGS and DNX SAI to 5.0.0.4 and additional flags needed in saibcm-modules
The following CSP's are merged in 5.0.0.4
CS00012182148 [4.3] Rate Limit Parity error message to syncd/sonic.
CS00012178692 [4.3] ACL drops counted as interface drops
CS00012183901 [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00012070713 [SAI 4.3 , DNX, 8690] Everflow ACL creation fails - brcm_sai_dnx_create_acl_table API fails, with unknown attribute error.
CS00012023263 [4.4] TD3/TH2 : Support 4 lossless queues(2 SW PFCWD and 2 HW PFCWD)
CS00012019578 [4.4] Pre FEC bit-error rate (BER) - DNX and XGS (TD and TH 50/100G)
How I did it
Changes the various make files to include the new SAI release + update the opennsl-modules.
Why I did it
Allows users to host their own local docker registries and utilize them via the REGISTRY_SERVER and REGISTRY_PORT environmental variables
How I did it
Only set REGISTRY_SERVER and REGISTRY_PORT in rules/config if they are unset.
How to verify it
Export environmental variables REGISTRY_SERVER and REGISTRY_PORT to an alternative docker registry. Export the environmental variable ENABLE_DOCKER_BASE_PULL to y.
Ensure the required sonic-slave docker images are not present locally, but are available in the docker registry
Execute make init and make configure
Confirm that the appropriate docker images were pulled from the appropriate docker registry, and not built locally
Update FW version to 2008.3218, fixing the following issues:
- 50G/100G links that are operationally down before warm-reboot are not coming up after warm-reboot
- 50G/100G links with admin shut / no shut commands are not coming up after warm-reboot
Signed-off-by: Dror Prital <drorp@nvidia.com>
- Why I did it
* For SAI - Advance to adopt the following fixes:
1. Better handle not implement object type for resource availability
2. Fix ext dump when saidump is triggered from 2nd process (saidump utility) other than main adapter host (syncd in SONiC)
* For SDK\FW:
- Changes and new features:
1. Added support in SN4600C systems for new module Finisar ET7402-CWDM4 (100G CWDM4 QSFP28 1310nm SM 2KM).
2. Added support for new module MMS1W50-HM (2km transceiver FR4) for 200GbE
3. Improved performance of "per-port-buffer" counters
4. Added support for Kernel 5.10
- Bugs fixes:
On rare occasions (0.5%), in SN4600C systems, when using 100GbE NRZ mode and Fastboot flow, the link up time may take up to 10 seconds
Signed-off-by: Dror Prital <drorp@nvidia.com>
Why I did it
Currently SONiC use the 'isc-dhcp-relay' package to allow DHCP relay functionality on IPv4 networks only.
This will allow the IPv6 functionality along the IPv4 type.
How I did it
Edit supervisord template to start DHCPv6 instances when configured to do so on Config DB.
Align cfg unit test to the new change.
Add DHCPv6 relay minigraph parsing support and a suitable t0 topology xml file for UT.
How to verify it
Configure DHCPv6 agents as described on the feature HLD: Azure/SONiC#765
Test it with real client/server with IPv6 or use the dedicated automatic test: Azure/sonic-mgmt#3565
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
* Split docker-dhcp-relay.supervisord.conf.j2 template into several files for easier code maintenance
Why I did it
Allow deploying DHCPv6 servers following the implementation PR: #7772
How I did it
Add DHCPv6 to minigraph.py on sonic-cfggen tool and improve the unit test to cover this change.
How to verify it
Try to deploy a switch with DHCPv6 servers.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
#### Why I did it
Enhance DHCP monitor application following the implementation PR: https://github.com/Azure/sonic-buildimage/pull/7772
#### How I did it
Add the support for monitoring DHCPv6 packets.
#### How to verify it
Install an image with this PR and the implementation PR.
- Why I did it
Currently dhcp packets are disabled by the COPP manager for non ToRRouter type switches.
Even if the feature is enabled, DHCP packets wont hook to the CPU since the COPP manager will not trap this packets.
This change is to disable dhcp_relay by default for non ToRRouter switches from init_cfg.json.
With this approach, if the user want to enable the feature for non ToRRouter switches, manual enablement is required by the 'feature' configuration.
This is to keep the current approach for MSFT production issue with dhcp relay for non ToRRouter switched and allow the user to decide if to use it or not.
- How I did it
Configure dhcp_relay 'disabled' by default on init_cfg.json for non ToRRouter switches.
Remove the exclusion of dhcp packets on copp_cfg.json
- How to verify it
Enable dhcp_relay feature on a non ToRRouter switch.
Unit-tests modified so the default values on mocked CONFIG DB in 'test_vectors.py' for dhcp_relay will be 'disabled'.
This is by the change for 'init_cfg.json.j2'.
For ToRRouter the state will change from 'disabled' to 'enabled'.
Another test case added for a 'ToR' switch type, this is to test the state is 'enabled' if the user configured it to be so.
Why I did it
Currently hostcfgd is implemented in a way each feature which is enabled/disabled triggering execution of systemctl enable/unmask commands which eventually trigger 'systemctl daemon-reload' command.
Each call like this cost 0.6s and overall add a overhead of ~12 seconds of CPU time.
This change will verify the desired state of a feature and the current state of this feature on systemd and trigger a system call only when must.
How I did it
Check each feature status on systemd before executing a system call to enable and reload the systemctl daemon.
How to verify it
Build an image with this change and observe less system calls are executed.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
NOTE: This is cherry-pick from 1911/2012 to master.
- Why I did it
To fix LAG IP configuration race
- How I did it
Extended timeout for teammgrd
- How to verify it
Add >80 router LAGs. Do config reload
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
After https://github.com/Azure/sonic-buildimage/pull/7598 the packages.json generation is broken. This change fixes it make the whole build fail in case generation failed.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
Static route configuration should not depend on BGP_ASN. Remove the dependency on BGP_ASN for StaticRouteMgr.
Fix#8027
How I did it
Check if BGP_ASN field before configuring static route redistribution and wait until BGP_ASN is available to enable static route redistribution.
How to verify it
Add unit test to cover the scenario and verify the functionality on a virtual switch.
Why I did it
systemd-sonic-generator limits multi-asic unit file instances to 10 (single digit instance number 0 - 10). This limitation needs to be removed to handle more than 10 asics.
MAX_NUM_TARGETS and MAX_NUM_INSTALL_LINES limits to 15 which is not sufficient for systems with more than 15 asics.
Inside get_unit_files(), strcmp produce incorrect results due to non null terminated string being compared.
Added build UT support for systemd-sonic-generator
Updates:
888701b [Mellanox] Remove mstdump from Mellanoxs collect dump script ([Azure/sonic-utilities#1706])
4818360 [sonic-package-manager] support warm/fast reboot for extension packages ([Azure/sonic-utilities#1554])
793b847 [show priority-group drop counters] Remove backup with cached PG drop counters after 'config reload' ([Azure/sonic-utilities#1679])
24fe1ac [show][config] support for interface alias for muxcable commands ([Azure/sonic-utilities#1699])