- Finish the refactor of the smbus backend (huge performance enhencement)
- Split scd-hwmon kernel module in multiple source files
- Add new scd-uart driver to manage linecard consoles within a chassis
- Bootstrap of thermal platform API implementation
- Fix psud led error message
- Fix fan led color on Smartsville
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:
```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```
This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.
Resolves https://github.com/Azure/sonic-buildimage/issues/5241
* [redis] Use redis-server and redis-tools in blob storage to prevent
upstream link broken
* Use curl instead of wget
* Explicitly install dependencies
* [sonic-utilities]update submodule with fix
This PR addresses fixes in sonic-py-common to imitate the behavior inside
sonic-cfggen. Essentially this is a fix for accessing the port-config file.
First check if there is a platform.json file for config generation
and then for legacy port_config.ini.
Also updating the sub-module sonic-utilities.
Fix pfcwd stats crash with invalid queue name (#1077)
[show][bgp]Display the Total number of neighbors in the show ip bgp(v6) summary. (#1079)
[config] Update SONiC Environment Vars When Loading Minigraph (#1073)
Multi asic platform changes for interface, portchannel commands (#878)
Update Command-Reference.md (#1075)
[filter-fdb] Fix Filter FDB With IPv6 Present in Config DB (#1059)
[config] Remove _get_breakout_cfg_file_name helper function (#1069)
[SHOW][BGP] support show ip(v6) bgp summary for multi asic platform (#1064)
[fanshow] Display other fan status, such as Updating (#1014)
Add ip_prefix len based on proxy_arp status (#1046)
Enable the platform specific ssd firmware upgrade during reboot (#954)
[show][cli[show interface portchannel support for Multi ASIC (#1005)
support show interface commands for multi ASIC platforms (#1006)
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
https://github.com/Azure/sonic-buildimage/issues/5255
Root Cause: Waiting on Restore count != 0 can lead to race condition
between orchagent process and swssconfig.sh.
Ideally check of Restore count != 0 is not needed as the State DB
cannot be flushed as if it was flushed then Warm Restart or swss-restart
should not be true also.
as needed by SAI 3.7 and above. Without this change
Warmboot fails from 3.5 to 3.7 as Braodcoam Datastructure
gets corrupted after warm-boot.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Issue: Binary ebtables config file is CPU arch dependent
Fix: Load the text config during firsttime boot and
Generate the binary persistent atomic file
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
sonic-cfggen is now using Unix Domain Socket for Redis DB. The socket
is created using root account. Subsequently, services that are started
as admin fails to start. This PR creates redis group and add admin
user to redis group. It also grants read/write access on redis.sock
for redis group members.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
[schema] Make schema header support C project (#373)
Removed DB specific get api's from Selectable class (#378)
With the change as part of #378 caclmgrd need to be updated
to use new client side Get API to access namespace.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Remove radvd Makefile and patch, change docker-router-advertiser Dockerfile template to simply install the vanilla radvd package using apt-get.
- In PR https://github.com/Azure/sonic-buildimage/pull/2795, we started building radvd from source and patching it to prevent it from erroring out when advertising an MTU of 9100 which was greater than the MTU size configured on the bridge interface (1500), which was due to a limitation in the 4.9 Linux kernel.
- Master branch is now using Linux kernel 4.19. As of 4.18, the kernel supports setting a bridge MTU to a value > 1500.
- PR https://github.com/Azure/sonic-swss/pull/1393 modified vlanmgrd to take advantage of this and now configures the MTU of bridge interfaces in SONiC to the proper size of 9100. Therefore, we no longer need to patch radvd. Since we no longer need to patch radvd, we no longer need to build it from source, so we can save build time by going back to simply installing the vanilla radvd Debian package in the router-advertiser container.
Copying platform.json file into an empty /usr/share/sonic/platform directory does not mimic an actual device. A more correct approach is to create a /usr/share/sonic/platform symlink which links to the actual platform directory; this is more like what is done inside SONiC containers. Then, we only need to copy the platform.json file into the actual platform directory; the symlink takes care of the alternative path, and also exposes all the other files in the platform directory.
sonic-py-common package relies on the `PLATFORM` environment variable to be set at runtime in the SONiC VS container. Exporting the variables in the start.sh script causes the variables to only be available to the shell running start.sh and any subshells it spawns. However, once the script exits, the variable is lost. This is resulting in the failure of tests which are run in the VS container, as they call applications which in turn call sonic-py-common functions which rely on PLATFORM to be set.
Setting the environment variables in the Dockerfile allows them to persist through the entire runtime of the container.
The cmd "mclagdctl dump state" would goes wrong when there are two or more mclag_groups configured. The field "MCLAG Interface" can not be displayed in some group.
Signed-off-by: Sun Dandan <sundandan@asterfusion.com>
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
The following changes are done.
- Multi asic platform have 2 Loopback interfaces, Loopback0 and Loopback4096. IPinIP decap entries need to be added for both of them. Update the ipinip.json.j2 template to add decap entries for Loopback4096.
- Add corressponding unit test
The pcie-check.sh script was added in https://github.com/Azure/sonic-buildimage/pull/4771, but was not given executable permission. Therefore, we would see messages like:
```
Aug 26 22:54:05.536248 sonic ERR systemd[664]: pcie-check.service: Failed to execute command: Permission denied
Aug 26 22:54:05.536386 sonic ERR systemd[664]: pcie-check.service: Failed at step EXEC spawning /usr/bin/pcie-check.sh: Permission denied
Aug 26 22:54:05.536600 sonic WARNING systemd[1]: pcie-check.service: Failed with result 'exit-code'.
```
Update the patform daemons with new commits
commit e9628b6b5a4683c7366fc99ee19ce727546fbabb
Merge pull request #64 from judyjoseph/multi_asic_ledd_xcvr
* platform daemon (Xcvrd, Ledd) changes for multi asic platform
* Updates in ledd daemon to use namespaces and get the namespace from selector object.
* Updates to xcvrd daemon to use the asic_id in talking to the right DB.
* Updated based on new sonic-py-common API's
* Invoke initializeGlobalConfig() in the SfpUpdate/DomInfoUpdate processes as well.
commit 415b8c457625c514aff0f8ecbdbbb655414d8067
[thermalctld] Optimize the thermal policy loop to make it execute every 60 seconds (#77)
commit 3d1f3196fd9c9942134e4926de7d248743e9589d
Update FAN_INFO in psud to avoid inconsistant output of show platform psud and show platform fan (#81)
Updating platform-common submodule with these commits
commit 14c6e53ecb861e124e2b45a7b65875ffac1b949e
[sff8472.py] Make hex keys all lowercase (#115)
Alpha chars in hex-based keys should be lowercase
commit b60f46cd1fb0ced1ffbff382e0125517f8c74b9e
Sfputil base and helper class changes for multi-ASIC (#100)
* Sfputil base and helper class changes for multi-ASIC
> adding the logical interface to asic id mapping
* Updated based on new sonic-py-common API's.
management framework provides management plane services like rest and
CLI which is not needed right after boot, instead by delaying this
service we give some more CPU for data plane and control plane services
on fast/warm boot.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
New attribute 'has_timer' introduced to init_cfg.json does not evaluate
as Bool, rather it evaluates as string. This PR fixes this issue. Also,
this PR fixes an issue when there is system config unit (snmp, telemetry) that
has no installation config (WantedBy=, RequiredBy=, Also=, Alias=) settings
in the [Install] section. In the latter case, the .service should not be enabled.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
This PR enables cfggen to readr/write from Redis DB using pipelines.
Pipelines enables batch read/write from/to Redis DB.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Change:
c25d492 Merge pull request #83 from tahmed-dev/taahme/add-redis-pipeline-operation
198d143 review comments - part of [configdb] Add Ability to Query/Update Redis Using Pipelines
994851c review comments - part of [configdb] Add Ability to Query/Update Redis Using Pipelines
2d2b7e1 making lgtm happy - part of [configdb] Add Ability to Query/Update Redis Using Pipelines
fa9093c [configdb] Add Ability to Query/Update Redis Using Pipelines
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Add a master switch so that the sync/async mode can be configured.
Example usage of the switch:
1. Configure mode while building an image
`make ENABLE_SYNCHRONOUS_MODE=y <target>`
2. Configure when the device is running
Change CONFIG_DB with `sonic-cfggen -a '{"DEVICE_METADATA":{"localhost": {"synchronous_mode": "enable"}}}' --write-to-db`
Restart swss with `systemctl restart swss`
**- Why I did it**
To install the framework for adding unit tests to the sonic-py-common package and report coverage.
** How I did it **
- Incorporate pytest and pytest-cov into sonic-py-common package build
- Updgrade version of 'mock' installed to version 3.0.5, the last version which supports Python 2. This fixes a bug where the file object returned from `mock_open()` was not iterable (see https://bugs.python.org/issue32933)
- Add support for Python 3 setuptools and pytest in sonic-slave-buster environment
- Add tests for `device_info.get_machine_info()` and `device_info.get_platform()` functions
- Also add a .gitignore in the root of the sonic-py-common directory, move all related ignores from main .gitignore file, and add ignores for files and dirs generated by pytest-cov
The first partition starting point was changed to be 1M as part of this
commit: 6ba2f97f1e. On systems that are misaligned before conversion
(partition start is the first sector), the relica partition that is
left in the first MB can cause problems in Aboot and result in corruption
of the filesystem on the new aligned partition.
Zeroing this old relica makes sure that there is nothing left of the old
partition lying around. There won't be any risk of having Aboot corrupt
the new filesystem because of the old relica.
Signed-off-by: Baptiste Covolato <baptiste@arista.com>
* src/sonic-swss d2bab10...c4949a2 (34):
> [dvs] Add new common issues and TOC to DVS README (#1405)
> Avoid adding loopback interface (ip link add) when setting nat zone on loopback interface (#1411)
> [portsorch] add buffer drop FC group (#1368)
> [dvs/chassis] Bring up SONiC interfaces in virtual chassis (#1410)
> [chassis/dvs] Add support for virtual chassis to DVS testbed (#1345)
> [sonic-swsss] Fix the issue of field "next_hop_ip" not getting updated in state DB in ERSPAN Mirror (#1375)
> [intfmgr] Fix OA crash issue due to link local configurations (#1195)
> Fix the issue when persistent DVS is used to run pytest which has number of front-panel ports < 32 (#1373)
> [dvs] Refactor AsicDbValidator (#1402)
> [fec] Get FEC mode when port is already admin down (#1403)
> [fec] added logic that put port down before applying fec onfiguration (#1399)
> [dvs] Add performance test for adding and deleting routes (#1392)
> Ignore IPv6 link-local and multicast entries as Vnet routes (#1401)
> [vlanmgr] Support Jumbo Frame By Default (#1393)
> Fix log/syslog not being correct when last test fails for given module (#1395)
> Get initial speed from ASIC DB (#1390)
> [dvs] Add options to limit CPU usage (#1394)
> [intfsorch] Retrieve Port object before setting NAT zone on router interfaces. (#1372)
> [.gitignore] Ignore gearsyncd binary (#1381)
> Added Max Nexthopgroup/ECMP Count supported by device into State DB. (#1383)
> [dvs] Upload logs even if failure occurs during startup (#1389)
> [rates] fix issue with rates init (#1387)
> [dvs] Validate that SWSS is ready to receive input before starting tests (#1385)
> [dvs] Convert sflow and speed tests to use dvslib (#1382)
> [dvs_acl] Refactor and document dvs_acl library (#1378)
> [dvs] Fix install instructions in README (#1379)
> [dvs] Update README with new flags, options, and known issues (#1380)
> swss: gearsyncd should return 0 on exit (#1376)
> Remove 00-copp.config.json from swss debian package. (#1366)
> fix undefined var in rates lua scripts (#1365)
> [fdborch] Fixed Orchagent crash in FDB flush on port disable. (#1369)
> [tlm_teamd]: Try to add LAG again, when teamd is not ready first time (#1347)
> [vs] Incorporate python3 best practices into DVSLib (#1357)
> [dvs] Mark unstable tests as xfail (#1356)
- Why I did it
When SONiC is configured with the management framework and/or telemetry services, the applications running inside those containers need to access some functionality on the host system. The following is a non-exhaustive list of such functionality:
Image management
Configuration save and load
ZTP enable/disable and status
Show tech support
- How I did it
The host service is a Python process that listens for requests via D-Bus. It will then service those requests and send a response back to the requestor.
This PR only introduces the host service infrastructure. Applications that need access to the host services must add applets that will register on D-Bus endpoints to service the appropriate functionality.
- How to verify it
- Description for the changelog
Add SONiC Host Service for container to execute select commands in host
Signed-off-by: Nirenjan Krishnan <Nirenjan.Krishnan@dell.com>
* src/sonic-utilities d5fdd74...17fb378 (7):
> [sonic-installer] Import re module (#1061)
> [fast-reboot]: Fix fail to execute fast-reboot problem (#1047)
> [config] Reduce Calls to SONiC Cfggen (#1052)
> [filter-fdb] Call Filter FDB Main From Within Test Code (#1051)
> [sflow_test.py]: Fix show sflow display. (#1054)
> Change fast-reboot script to use swss and radv service script (#1036)
> Common functions for show CLI support on multi ASIC (#999)
**- Why I did it**
PR https://github.com/Azure/sonic-buildimage/pull/4599 introduced two bugs in the startup of the router advertiser container:
1. References to the `wait_for_intf.sh` script were changed to `wait_for_link.sh`, but the actual script was not renamed
2. The `ipv6_found` Jinja2 variable added to the supervisor config file goes out of scope before it is read.
**- How I did it**
1. Rename the `wait_for_intf.sh` script to `wait_for_link.sh`
2. Use the Jinja2 "namespace" construct to fix the scope issue
**- How to verify it**
Ensure all processes in the radv container start properly under the correct conditions (i.e., whether or not there is at least one VLAN with an IPv6 address assigned).
Commit e484ae9dd introduced systemd .timer unit to hostcfgd.
However, when stopping service that has timer, there is possibility that
timer is not running and the service would not be stopped. This PR
address this situation by handling both .timer and .service units.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
startup when doing redis PING since database_config.json getting
generated from jinja2 template is still not ready.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>