This change is to add a gbsyncd container to accommodate the syncd process and the SAI libraries for the Credo gearbox chips.
How I did it
This container works similar to the existing Broadcom syncd container. Its main difference is that the SAI-related dynamic libraries are replaced by the ones for Credo gearbox chips, and the container only reacts to SAI events for the gearbox chips. The SAI libraries will be provided by the package libsai-credo_1.0_amd64.deb.
For the image build, the added container will be built and included in the Broadcom platform image, after $(LIBSAI_CREDO)_URL = is replaced to the correct value. For now, as $(LIBSAI_CREDO)_URL is empty, the container build is skipped in the image build.
After the container is included in the image, in the runtime, the container will begin with checking the existence of /usr/share/sonic/hwsku/gearbox_config.json; if that file is not provided, the container will exit by itself. Therefore, for platforms unrelated to the Credo chips, as long as they are not providing the file, they will not be affected by this change.
Why I did it
Added a check in determining CPU reset in fast-/warm-reboot in their respective platform plugin.
Introducing reboot plugin for "reboot" command to handle its own platform plugin.
How I did it
On branch s6100_fast_warm_check
Changes to be committed:
(use "git reset HEAD ..." to unstage)
modified: ../../debian/platform-modules-s6100.install
modified: ../scripts/fast-reboot_plugin
modified: ../scripts/platform_reboot_override
new file: ../scripts/reboot_plugin
modified: ../scripts/track_reboot_reason.sh
modified: chassis.py
How to verify it
Triggered cold-reset inside fast-reboot to test out the reboot-cause 2.0 API.
Why I did it
serial-getty service exited in Dell S6100 device randomly.
How I did it
Added serial-getty to monit services.
How to verify it
Stop serial-getty in ssh session and check whether the service restarts or not.
Included commits:
* dd01b56 disk_check updates:
* 8a74d03 [CLI][show][bgp] Fix the show ip bgp network command
* 679a4ba [MACsec]: Allow upgrade-docker for macsec container
* e9c73e8 [CLI][MPLS][Show] Added multi ASIC support for 'show mpls command'.
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Changes in the buffer template did not trigger a new sonic-config-engine wheel build and the cached build was used for the PR merge. When the new wheel build got trigged, few sonic-cfggen testcases started failing because of the changes made in the buffer templates.
How to verify it
Updated the dependency to include buffer templates and built sonic_config_engine-1.0-py3-none-any.whl. Testcase failure was seen as expected
I have been seeing intermittent (~40%) build failures with the same error described in PR https://github.com/Azure/sonic-buildimage/pull/6592, even with that fix present
```
/usr/bin/ld: mibgroup/ip-forward-mib/ipCidrRouteTable/.libs/ipCidrRouteTable_interface.o: file not recognized: file truncated
...
libtool: error: 'mibgroup/ip-forward-mib/inetCidrRouteTable/inetCidrRouteTable_interface.lo' is not a valid libtool object
make[5]: *** [Makefile:1020: libnetsnmpmibs.la] Error 1
make[5]: *** Waiting for unfinished jobs....
```
#### How I did it
Use `-j1` for the libsnmp build regardless of the value of `$(MULTIARCH_QEMU_ENVIRON)`
#### How to verify it
Performed 10 builds of the libsnmp target (`target/debs/buster/libsnmp-base_5.7.3+dfsg-5_all.deb`) with and without this change. Without the change, hit the error 40% of the time. With the change did not see the error at all
Signed-off-by: Justin Sherman <jusherma@cisco.com>
This PR creates a directory firmware on the HOST with the path /usr/share/sonic/firmware, as well as this is
mounted on PMON container with the same path /usr/share/sonic/firmware. This is required for firmware
upgrade support for muxcable as currently by design all Y-Cable API's are called by xcvrd. As such if CLI has
to transfer a file to PMON we need to mount a directory from host to PMON just for getting the firmware files.
Hence we require this change.
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Signed-off-by: Neetha John <nejo@microsoft.com>
As part of the buffer reclamation efforts for TD2, setting the default cable len to 0m which means unused ports will have a cable len of 0m.
Why I did it
To align with the changes in Azure/sonic-swss#1830
How to verify it
With the default cable len set to 0m and the associated changes in swss, CABLE_LENGTH table had '0m' set for unused ports and accordingly more space was reserved for the shared pool
- Improve chassis linecard restartability
- Fix 'show system-health' cli by adding non standard api
- Fix ledd crash on linecards with Recycle/Inband ports
- Refactor DPM management and add ADM1266 support
- Add state machine to update DPM RTC clock periodically
- Improve xcvr temperature reporting
- Fix lane mapping and `default_sku` for `x86_64-arista_7170_32c` platform
- Fix `7170-32C/CD` platform definition
0443e66050256a87f8e92db7cd3c36cc139ebe14 (HEAD -> master, origin/master, origin/HEAD) Remove DB Directory removal as part of make clean (#84)
085f29d1247f0333e6038751fa445b6068fcf987 Fix unhandled nil err check to prevent rpc causing a crash (#78)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
The unit test failure was due to missing bgp graceful restart select
defer time configuration in voq_chassis.conf. Modified sample output
data file voq_chassis.conf to include this configuration.
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
Changes to allow starting per asic services like swss and syncd only if the platform vendor codedetects the asic is detected and notified. The systemd services ordering we want is database->database@->pmon->swss@->syncd@->teamd@->lldp@
There is also a requirement that management, telemetry, snmp dockers can start even if all asic services are not up.
Why I did it
For VOQ chassis, the fabric cards will have 1-N asics. Also, there could be multiple removable fabric cards. On the supervisor, swss and syncd containers need to be started only if the fabric-card is in Online state and respective asics are detected by the kernel. Using systemd, the dependent services can be in inactive state.
How I did it
Introduce a mechanism where all ASIC dependent service wait on its state to be published via PMON to REDIS. Once the subscription is received, the service proceeds to create respective dockers.
For fixed platforms, systemd is unchanged i.e. the service bring up and docker creation happens in the start()/ExecStartPre routine of the .sh scripts.
For VOQ chassis platform on supervisor, the service bringup skips docker creation in the start() routine, but does it in the wait()/ExecStart routine of the .sh scrips.
Management dockers are decoupled from ASIC docker creation.
For multiasic, the back end asics use ip addresss of Loopback4096 for BGP router id. In VOQ multi-asic chassis there are no back end asics. All the asics are front end and the iBGP connections are established via Ethernet-IB of asics. Since these asics are not designated as BackEnd, the ip address of interface Loopback0 is used as BGP router id. Since the ip address of Loopback0 is same for all the asics in the line card, same router id is used for voq iBGP configurations and hence the iBGP connections are not established. Changes are done to fix this
Why I did it
In the config_db.json generated by minigraph "admin_status" attribute is missing for the VOQ inband interface port in the PORT table.
How I did it
Changes done to add admin_status attribute for voq inband interface port, if it exists in the PORT table keys.
Why I did it
There are scenarios that End-of-RIB comes from a part of the peers arrives after reconciliation. In such scenarios, if the route selection deferral timer has the default value of 360 seconds, FRR would not set up routes and all routes would be removed after reconciliation. This PR reduces the route selection deferral timer so that at least routes to parts of the peers get restored at the point of reconciliation.
Fix#7488
How I did it
Reduce route selection deferral timer for bgp graceful restart to 15 seconds.
Why I did it
BIOS upgrade on rare cases cannot guarantee bus value remain the same on every BIOS release. Ignoring this field in order for pcied not to fail but still verify device id in a different way. The solution is future proof and will not require changes in code when new BIOS version is available
How I did it
Since bus is not a fixed value (it is determined by the bios version) we are ignoring this field, and instead checking if there is a device that match on all other fields that and in addition has a matching device id.
How to verify it
Verify no errors or failures in pcied on different BIOS version with the same code base.
#### Why I did it
Enhance DHCP monitor application following the implementation PR: https://github.com/Azure/sonic-buildimage/pull/7772
#### How I did it
Add the support for monitoring DHCPv6 packets.
#### How to verify it
Install an image with this PR and the implementation PR.
Update sonic-sairedis submodule to include below commits:
84fa50a Revert "[vs]: Start syncd by passing context configuration file and global context index. (#832)" (#859)
736dc3b Remove redudnant mention of platform cisco-8000 (#856)
969ad94 Support for cisco-8000 platform for sonic-sairedis/syncd (#823)
1eacd05 [sairedis] Client/Server add support for SAI stats api (#855)
59fedfa [sairedis] Client/Server support SAI fdb flush api (#853)
5c2aaae [syncd] bulk OID remove requires RID (#854)
7da0894 [sairedis] Client/Server support SAI query API (#848)
443ad36 [sairedis] Style refactor cleanup (#850)
Signed-off-by: Suvarna Meenakshi <sumeenak@microsoft.com>
Add needed code to pddf_custom_psu.c to deal with multi PSU and get SN.
How to verify it
Plugin new PSU (3Y) and test,
```
root@sonic:/sys/bus/i2c/drivers/psu/9-0050# cat psu_serial_num
S0A000X601919000013
root@sonic:/sys/bus/i2c/drivers/psu/9-0050# cat psu_model_name
YESM1300AM
root@sonic:/home/admin# pddf_psuutil mfrinfo
PSU Status Manufacturer ID Model Serial Fan Airflow Direction
PSU1 NOT OK 3Y POWER YESM1300AM S0A000X601919000007 exhaust
PSU2 OK 3Y POWER YESM1300AM S0A000X601919000013 exhaust
```
Co-authored-by: Jostar Yang <jostar_yang@accton.com.tw>
1. Implement FanDrawer-Fan hierarchy.
2. Enable thermalctld, disable pcied.
3. Implement SystemLED in Chassis.
4. Correct Fan direction
5. Implement require Fan APIs for SystemHealthMonitoring.
6. Handle non-ascii character while reading PSU model/serial num.
```
Check if System-health can pass the check and display the SystemLED correctly.
///////// booting, DIAG_LED = GREEN_BLINKING /////////
root@sonic:/tmp# show system-health detail
System is currently booting...
root@sonic:/tmp# cat /sys/class/leds/diag/brightness
5
///////// container_checker fail, DIAG_LED = AMBER /////////
root@sonic:/sys/bus/i2c/devices# show system-health detail
System status summary
System status LED STATUS_LED_COLOR_AMBER
Services:
Status: OK
Hardware:
Status: Not OK
Reasons: container_checker is not Status ok
System services and devices monitor list
Name Status Type
-------------------------- -------- ----------
container_checker Not OK Program
sonic OK System
rsyslog OK Process
root-overlay OK Filesystem
var-log OK Filesystem
routeCheck OK Program
diskCheck OK Program
container_memory_telemetry OK Program
FAN-1F OK Fan
FAN-1R OK Fan
FAN-2F OK Fan
FAN-2R OK Fan
FAN-3F OK Fan
FAN-3R OK Fan
PSU-1 FAN-1 OK Fan
PSU-2 FAN-1 OK Fan
PSU 1 OK PSU
PSU 2 OK PSU
System services and devices ignore list
Name Status Type
--------------- -------- ------
asic Ignored Device
psu.temperature Ignored Device
///////// skip container_checker, DIAG_LED = GREEN /////////
root@sonic:/sys/bus/i2c/devices# vi /usr/share/sonic/device/x86_64-accton_as4630_54te-r0/system_health_monitoring_config.json
root@sonic:/sys/bus/i2c/devices#
root@sonic:/sys/bus/i2c/devices#
root@sonic:/sys/bus/i2c/devices# show system-health detail
System status summary
System status LED STATUS_LED_COLOR_GREEN
Services:
Status: OK
Hardware:
Status: OK
System services and devices monitor list
Name Status Type
-------------------------- -------- ----------
sonic OK System
rsyslog OK Process
root-overlay OK Filesystem
var-log OK Filesystem
routeCheck OK Program
diskCheck OK Program
container_memory_telemetry OK Program
FAN-1F OK Fan
FAN-1R OK Fan
FAN-2F OK Fan
FAN-2R OK Fan
FAN-3F OK Fan
FAN-3R OK Fan
PSU-1 FAN-1 OK Fan
PSU-2 FAN-1 OK Fan
PSU 1 OK PSU
PSU 2 OK PSU
System services and devices ignore list
Name Status Type
----------------- -------- -------
container_checker Ignored Service
psu.temperature Ignored Device
asic Ignored Device
```
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
lazy_re had an issue when importing sonic-cfggen in another application that
uses re.search(). There is no much improvement of lazy_re today after many
other good optimization work done for sonic-cfggen. It served as a quick
temporary solution.
Some quick test for fast-reboot and warm-reboot done on top of 201911 branch:
Fast-reboot: from ASIC reset to ports in up state:
with lazy_re: 18 sec
without lazy_re: 18 sec
Warm-reboot: LAG restoration time:
with lazy_re: 73 sec
without lazy_re: 72 sec
So, there is no real optimization since the number of sonic-cfggen calls is greatly
reduced in latest SONiC. This means it is time to revert this change.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Move SAI libraries for Broadcom for XGS and DNX families to 5.0.0.6
Included fixes
```
CS00012195263 | [4.3][5.0][TD3] Packets with broken IP headers received on VLAN interface are not dropped
CS00012192505 | [4.3] Re-encap IPinIP decap packets
CS00012192502 | [3.7.5.2] Start LED shell script execution on all DELL based platforms causing all ports flapping on SAI 3.7.5.2
CS00012191363 | [4.3] Support of memscan thread to detect TCAM parity error
CS00012190932 | [4.3] SAI_PORT_PFC_X_RX_PKTS incremented incorrectly even when no PFC frames are received on that priority
CS00012183901 | [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00011382163 | [4.4] Support warm-boot from 3.5 to 4.3
CS00011318937 | [4.3] MACSec SAI Support for Jericho2c+
CS00011318926 | [4.3] Provide SAI support for Jericho2c+
CS00012195263 | [4.3][5.0][TD3] Packets with broken IP headers received on VLAN interface are not dropped
CS00012195261 | [4.3][5.0][TD3]VLAN tagged IP packet received on untagged interface being routed instead of dropped
CS00012183901 | [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00012196056 | [4.3.3.8][WARMBOOT] syncd[2584]: segfault at 5616ad6c3d80 ip 00007f61e0c6bc65 sp 00007fff0c5a7a90 error 4 in libsai.so.1.0[7f61e0a95000+3cd8000]
CS00012195262 | [4.3][5.0][TD3] Malformed IP packet(missing IP header) received on a VLAN Interface is flooded to other LVAN members instead of being dropped
CS00012195956 | [4.3.3.8] [TD3]Syncd Crash at brcm_sai_tnl_mp_create_tunnel()
PR 4346163: Add support for AN/LT
```
- Why I did it
To fix failed test cases of Haliburton platform APIs that found on platform_tests script
- How I did it
Add device/celestica/x86_64-cel_e1031-r0/platform.json
Update functions to support python3.7
Add more functions follow latest sonic_platform_base
Fix the bug
- How to verify it
Run platform_tests script
Signed-off-by: Wirut Getbamrung [wgetbumr@celestica.com]
- Why I did it
Update SAI version to 1.19.1. The following was changed:
1. Update license
2. Do not remove and re-apply the same SDK mirror session on LAG
3. FEC fix to support all speeds
4. Improve PG counters performance
5. Fix number of switch priorities for port mirroring
Signed-off-by: Dror Prital <drorp@nvidia.com>