This PR creates a directory firmware on the HOST with the path /usr/share/sonic/firmware, as well as this is
mounted on PMON container with the same path /usr/share/sonic/firmware. This is required for firmware
upgrade support for muxcable as currently by design all Y-Cable API's are called by xcvrd. As such if CLI has
to transfer a file to PMON we need to mount a directory from host to PMON just for getting the firmware files.
Hence we require this change.
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Signed-off-by: Neetha John <nejo@microsoft.com>
As part of the buffer reclamation efforts for TD2, setting the default cable len to 0m which means unused ports will have a cable len of 0m.
Why I did it
To align with the changes in Azure/sonic-swss#1830
How to verify it
With the default cable len set to 0m and the associated changes in swss, CABLE_LENGTH table had '0m' set for unused ports and accordingly more space was reserved for the shared pool
- Improve chassis linecard restartability
- Fix 'show system-health' cli by adding non standard api
- Fix ledd crash on linecards with Recycle/Inband ports
- Refactor DPM management and add ADM1266 support
- Add state machine to update DPM RTC clock periodically
- Improve xcvr temperature reporting
- Fix lane mapping and `default_sku` for `x86_64-arista_7170_32c` platform
- Fix `7170-32C/CD` platform definition
0443e66050256a87f8e92db7cd3c36cc139ebe14 (HEAD -> master, origin/master, origin/HEAD) Remove DB Directory removal as part of make clean (#84)
085f29d1247f0333e6038751fa445b6068fcf987 Fix unhandled nil err check to prevent rpc causing a crash (#78)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
The unit test failure was due to missing bgp graceful restart select
defer time configuration in voq_chassis.conf. Modified sample output
data file voq_chassis.conf to include this configuration.
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
Changes to allow starting per asic services like swss and syncd only if the platform vendor codedetects the asic is detected and notified. The systemd services ordering we want is database->database@->pmon->swss@->syncd@->teamd@->lldp@
There is also a requirement that management, telemetry, snmp dockers can start even if all asic services are not up.
Why I did it
For VOQ chassis, the fabric cards will have 1-N asics. Also, there could be multiple removable fabric cards. On the supervisor, swss and syncd containers need to be started only if the fabric-card is in Online state and respective asics are detected by the kernel. Using systemd, the dependent services can be in inactive state.
How I did it
Introduce a mechanism where all ASIC dependent service wait on its state to be published via PMON to REDIS. Once the subscription is received, the service proceeds to create respective dockers.
For fixed platforms, systemd is unchanged i.e. the service bring up and docker creation happens in the start()/ExecStartPre routine of the .sh scripts.
For VOQ chassis platform on supervisor, the service bringup skips docker creation in the start() routine, but does it in the wait()/ExecStart routine of the .sh scrips.
Management dockers are decoupled from ASIC docker creation.
For multiasic, the back end asics use ip addresss of Loopback4096 for BGP router id. In VOQ multi-asic chassis there are no back end asics. All the asics are front end and the iBGP connections are established via Ethernet-IB of asics. Since these asics are not designated as BackEnd, the ip address of interface Loopback0 is used as BGP router id. Since the ip address of Loopback0 is same for all the asics in the line card, same router id is used for voq iBGP configurations and hence the iBGP connections are not established. Changes are done to fix this
Why I did it
In the config_db.json generated by minigraph "admin_status" attribute is missing for the VOQ inband interface port in the PORT table.
How I did it
Changes done to add admin_status attribute for voq inband interface port, if it exists in the PORT table keys.
Why I did it
There are scenarios that End-of-RIB comes from a part of the peers arrives after reconciliation. In such scenarios, if the route selection deferral timer has the default value of 360 seconds, FRR would not set up routes and all routes would be removed after reconciliation. This PR reduces the route selection deferral timer so that at least routes to parts of the peers get restored at the point of reconciliation.
Fix#7488
How I did it
Reduce route selection deferral timer for bgp graceful restart to 15 seconds.
Why I did it
BIOS upgrade on rare cases cannot guarantee bus value remain the same on every BIOS release. Ignoring this field in order for pcied not to fail but still verify device id in a different way. The solution is future proof and will not require changes in code when new BIOS version is available
How I did it
Since bus is not a fixed value (it is determined by the bios version) we are ignoring this field, and instead checking if there is a device that match on all other fields that and in addition has a matching device id.
How to verify it
Verify no errors or failures in pcied on different BIOS version with the same code base.
#### Why I did it
Enhance DHCP monitor application following the implementation PR: https://github.com/Azure/sonic-buildimage/pull/7772
#### How I did it
Add the support for monitoring DHCPv6 packets.
#### How to verify it
Install an image with this PR and the implementation PR.
Update sonic-sairedis submodule to include below commits:
84fa50a Revert "[vs]: Start syncd by passing context configuration file and global context index. (#832)" (#859)
736dc3b Remove redudnant mention of platform cisco-8000 (#856)
969ad94 Support for cisco-8000 platform for sonic-sairedis/syncd (#823)
1eacd05 [sairedis] Client/Server add support for SAI stats api (#855)
59fedfa [sairedis] Client/Server support SAI fdb flush api (#853)
5c2aaae [syncd] bulk OID remove requires RID (#854)
7da0894 [sairedis] Client/Server support SAI query API (#848)
443ad36 [sairedis] Style refactor cleanup (#850)
Signed-off-by: Suvarna Meenakshi <sumeenak@microsoft.com>
Add needed code to pddf_custom_psu.c to deal with multi PSU and get SN.
How to verify it
Plugin new PSU (3Y) and test,
```
root@sonic:/sys/bus/i2c/drivers/psu/9-0050# cat psu_serial_num
S0A000X601919000013
root@sonic:/sys/bus/i2c/drivers/psu/9-0050# cat psu_model_name
YESM1300AM
root@sonic:/home/admin# pddf_psuutil mfrinfo
PSU Status Manufacturer ID Model Serial Fan Airflow Direction
PSU1 NOT OK 3Y POWER YESM1300AM S0A000X601919000007 exhaust
PSU2 OK 3Y POWER YESM1300AM S0A000X601919000013 exhaust
```
Co-authored-by: Jostar Yang <jostar_yang@accton.com.tw>
1. Implement FanDrawer-Fan hierarchy.
2. Enable thermalctld, disable pcied.
3. Implement SystemLED in Chassis.
4. Correct Fan direction
5. Implement require Fan APIs for SystemHealthMonitoring.
6. Handle non-ascii character while reading PSU model/serial num.
```
Check if System-health can pass the check and display the SystemLED correctly.
///////// booting, DIAG_LED = GREEN_BLINKING /////////
root@sonic:/tmp# show system-health detail
System is currently booting...
root@sonic:/tmp# cat /sys/class/leds/diag/brightness
5
///////// container_checker fail, DIAG_LED = AMBER /////////
root@sonic:/sys/bus/i2c/devices# show system-health detail
System status summary
System status LED STATUS_LED_COLOR_AMBER
Services:
Status: OK
Hardware:
Status: Not OK
Reasons: container_checker is not Status ok
System services and devices monitor list
Name Status Type
-------------------------- -------- ----------
container_checker Not OK Program
sonic OK System
rsyslog OK Process
root-overlay OK Filesystem
var-log OK Filesystem
routeCheck OK Program
diskCheck OK Program
container_memory_telemetry OK Program
FAN-1F OK Fan
FAN-1R OK Fan
FAN-2F OK Fan
FAN-2R OK Fan
FAN-3F OK Fan
FAN-3R OK Fan
PSU-1 FAN-1 OK Fan
PSU-2 FAN-1 OK Fan
PSU 1 OK PSU
PSU 2 OK PSU
System services and devices ignore list
Name Status Type
--------------- -------- ------
asic Ignored Device
psu.temperature Ignored Device
///////// skip container_checker, DIAG_LED = GREEN /////////
root@sonic:/sys/bus/i2c/devices# vi /usr/share/sonic/device/x86_64-accton_as4630_54te-r0/system_health_monitoring_config.json
root@sonic:/sys/bus/i2c/devices#
root@sonic:/sys/bus/i2c/devices#
root@sonic:/sys/bus/i2c/devices# show system-health detail
System status summary
System status LED STATUS_LED_COLOR_GREEN
Services:
Status: OK
Hardware:
Status: OK
System services and devices monitor list
Name Status Type
-------------------------- -------- ----------
sonic OK System
rsyslog OK Process
root-overlay OK Filesystem
var-log OK Filesystem
routeCheck OK Program
diskCheck OK Program
container_memory_telemetry OK Program
FAN-1F OK Fan
FAN-1R OK Fan
FAN-2F OK Fan
FAN-2R OK Fan
FAN-3F OK Fan
FAN-3R OK Fan
PSU-1 FAN-1 OK Fan
PSU-2 FAN-1 OK Fan
PSU 1 OK PSU
PSU 2 OK PSU
System services and devices ignore list
Name Status Type
----------------- -------- -------
container_checker Ignored Service
psu.temperature Ignored Device
asic Ignored Device
```
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
lazy_re had an issue when importing sonic-cfggen in another application that
uses re.search(). There is no much improvement of lazy_re today after many
other good optimization work done for sonic-cfggen. It served as a quick
temporary solution.
Some quick test for fast-reboot and warm-reboot done on top of 201911 branch:
Fast-reboot: from ASIC reset to ports in up state:
with lazy_re: 18 sec
without lazy_re: 18 sec
Warm-reboot: LAG restoration time:
with lazy_re: 73 sec
without lazy_re: 72 sec
So, there is no real optimization since the number of sonic-cfggen calls is greatly
reduced in latest SONiC. This means it is time to revert this change.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Move SAI libraries for Broadcom for XGS and DNX families to 5.0.0.6
Included fixes
```
CS00012195263 | [4.3][5.0][TD3] Packets with broken IP headers received on VLAN interface are not dropped
CS00012192505 | [4.3] Re-encap IPinIP decap packets
CS00012192502 | [3.7.5.2] Start LED shell script execution on all DELL based platforms causing all ports flapping on SAI 3.7.5.2
CS00012191363 | [4.3] Support of memscan thread to detect TCAM parity error
CS00012190932 | [4.3] SAI_PORT_PFC_X_RX_PKTS incremented incorrectly even when no PFC frames are received on that priority
CS00012183901 | [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00011382163 | [4.4] Support warm-boot from 3.5 to 4.3
CS00011318937 | [4.3] MACSec SAI Support for Jericho2c+
CS00011318926 | [4.3] Provide SAI support for Jericho2c+
CS00012195263 | [4.3][5.0][TD3] Packets with broken IP headers received on VLAN interface are not dropped
CS00012195261 | [4.3][5.0][TD3]VLAN tagged IP packet received on untagged interface being routed instead of dropped
CS00012183901 | [4.3][WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00012196056 | [4.3.3.8][WARMBOOT] syncd[2584]: segfault at 5616ad6c3d80 ip 00007f61e0c6bc65 sp 00007fff0c5a7a90 error 4 in libsai.so.1.0[7f61e0a95000+3cd8000]
CS00012195262 | [4.3][5.0][TD3] Malformed IP packet(missing IP header) received on a VLAN Interface is flooded to other LVAN members instead of being dropped
CS00012195956 | [4.3.3.8] [TD3]Syncd Crash at brcm_sai_tnl_mp_create_tunnel()
PR 4346163: Add support for AN/LT
```
- Why I did it
To fix failed test cases of Haliburton platform APIs that found on platform_tests script
- How I did it
Add device/celestica/x86_64-cel_e1031-r0/platform.json
Update functions to support python3.7
Add more functions follow latest sonic_platform_base
Fix the bug
- How to verify it
Run platform_tests script
Signed-off-by: Wirut Getbamrung [wgetbumr@celestica.com]
- Why I did it
Update SAI version to 1.19.1. The following was changed:
1. Update license
2. Do not remove and re-apply the same SDK mirror session on LAG
3. FEC fix to support all speeds
4. Improve PG counters performance
5. Fix number of switch priorities for port mirroring
Signed-off-by: Dror Prital <drorp@nvidia.com>
To include:
> e168f1d 2021-07-19 pettershao-ragilenetworks: [python coverage] fix result color bar (Azure/sonic-platform-common#202)
> 87c81de 2021-07-13 Prince George: Fix Xcvrd crash due to invalid key access in type_of_media_interface, host_electrical_interface, connector_dict (Azure/sonic-platform-common#206)
> 4533f82 2021-06-21 ngoc-do: Add a template function that returns list of asics on module (Azure/sonic-platform-common#185)
> 1e860c5 2021-06-18 Aravind Mani: Fix decode error when parsing EEPROM fields (Azure/sonic-platform-common#199)
> 93641f3 2021-06-17 Sujin Kang: Unifying the platform api for get_pcie_aer_stats with PcieBase (Azure/sonic-platform-common#197)
This update includes the following commits
acb5d84 Neetha John 2021-07-20 [configlet] Python3 compatible syntax for extracting a key from the dict (#1721)
9b7c58b arlakshm 2021-07-20 Load the database global_db only once for show cli (#1712)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
Why I did it
the motivation for this PR is to add retry_call to several test cases in the community, for example, the following cases:
test_show_platform_fanstatus_mocked
test_show_platform_temperature_mocked
are executing a command once and comparing the output to the expected mock data,
sometimes differences between the mock and the actual are causing the tests to fail.
retry will make these tests more stable.
retry will also be more efficient than sleep which will cause the tests to run longer because sometimes it is not necessary to sleep all that time, retry will only run a function only until it passed.
How I did it
added retry to the docker file
How to verify it
I run the tests with retry on the docker after installing the retry package
Signed-off-by: Sharon Lutati <slutati@nvidia.com>
As per HLD - Azure/SONiC#625
FRR Patches:
0009-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch
Files modified : bgpd_network.c and bgpd/bgp_zebra.c
Fix for : Link local scope was not set while binding socket with local address causing socket errors for bgp ipv6 link local neighbors.
0010-VRF-interface-lookup-was-still-done-in-the-default-vrf.patch
Files modified : staticd/static_zebra.c
Fix for : VRF interface lookup was still done in the default-vrf which was causing the interface lookup to fail. Due to this static-route pointing to link-local was not getting installed.
0011-Changes-to-send-ipv6-link-local-address-as-nexthop-to-fpmsyncd.patch
Files modified : zebra/zebra_fpm_netlink.c
Fix for : Made changes to send ipv6 address as nexthop to fpmsyncd.
Depends on:
Azure/sonic-utilities#1159Azure/sonic-swss#1463
Signed-off-by: Akhilesh Samineni akhilesh.samineni@broadcom.com
Avoid initializing sfp/thermal/components/fan/psu/leds on simx and create vpd_info file on hw_management when we use mellanox simulator platform
- Why I did it
this is a fix for issue in mellanox simulator platforms. the syseepromd failed on the pmon docker. also "decode-syseeprom" failed also
- How I did it
before initializing thermal/components/fan/psu/leds --> check if we are running on simx
creating the vpd_info on the hw_management folder.
- How to verify it
check if syseepromd process was loaded properly on the pmon docker.
decode-syseeprom is working well without errors/warnings
It can be that service is not enabled but UnitFilePreset=enabled (case
for Application Extension):
```
Loaded: loaded (/lib/systemd/system/cpu-report.service; disabled; vendor preset: enabled)
```
This makes existing logic skip enabling the service.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
- Why I did it
to prevent python exception error when executing warm-reboot command on mellanox simulator platform
- How I did it
return None on the watchdog python script on cases that watchdog file is not exist
- How to verify it
warm-reboot is running well without the python error. error message will appear on log on these cases.
in order to avoid this error message we can simulate the watchdog on mellanox simulator platform