* Support for multi-asic platform for swssloglevel command
admin@str-acs-1:~$ swssloglevel
Usage: /usr/bin/swssloglevel -n [0 to 3] [OPTION]...
* Update to use the env file to get the PLATFORM string.
* [platform] Add Support For Environment Variable
This PR adds the ability to read environment file from /etc/sonic.
the file contains immutable SONiC config attributes such as platform,
hwsku, version, device_type. The aim is to minimize calls being made
into sonic-cfggen during boot time.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
implements a new feature: "BGP Allow list."
This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.
SWSS config script restore ARP/FDB/Routes. Restore neighbor script
uses config DB ARP information to restore ARP entries and so needs
to be started after swssconfig exits.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:
```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```
This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.
Resolves https://github.com/Azure/sonic-buildimage/issues/5241
We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.
- Backport of https://github.com/Azure/sonic-buildimage/pull/5153 to the 201911 branch
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Install a newer version of rsyslog from stretch-backports to support -iNONE
Previous backport from master use -iNONE option which is only
available after v8.32.0
Signed-off-by: Guohan Lu <lguohan@gmail.com>
Printing both snapshot and current counter sets will make it easier to pinpoint
which message type(s) is/are not being relayed. This PR prints both counter sets.
Also, this PR defines gnu11 as a C standard to compile with in order to avoid
making changes when porting to 201811 branch.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
When BGP routes are missing, DHCP packets get relayed over mgmt
interface. This results in dhcpmon alerting that DHCP packets are
not being relayed. This is PR include mgmt interface as uplink
device, and so, if DHCP packet gets relayed over mgmt interface,
regular dhcpmon alert will not be issues. Instead, dhcpmon will
check the mgmt interface counts and issue a separate alert regarding
packets travelling through mgmt network.
In addition, this PR includes the following enhancements:
1. Add SIGUSR1 handler that prints out current packet counts
2. Increase alert grace window to 3 minutes from currently 2 minutes
3. Time is now computed more accurately
4. Print vlan name before counters
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Arp update process was not being started due to an issue with
the directory name having an extra 'd' in supervisor as in
'/etc/supervisord/conf.d/arp_update.conf'.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
**- Why I did it**
PR https://github.com/Azure/sonic-buildimage/pull/4599 introduced two bugs in the startup of the router advertiser container:
1. References to the `wait_for_intf.sh` script were changed to `wait_for_link.sh`, but the actual script was not renamed
2. The `ipv6_found` Jinja2 variable added to the supervisor config file goes out of scope before it is read.
**- How I did it**
1. Rename the `wait_for_intf.sh` script to `wait_for_link.sh`
2. Use the Jinja2 "namespace" construct to fix the scope issue
**- How to verify it**
Ensure all processes in the radv container start properly under the correct conditions (i.e., whether or not there is at least one VLAN with an IPv6 address assigned).
https://github.com/Azure/sonic-buildimage/issues/5255
Root Cause: Waiting on Restore count != 0 can lead to race condition
between orchagent process and swssconfig.sh.
Ideally check of Restore count != 0 is not needed as the State DB
cannot be flushed as if it was flushed then Warm Restart or swss-restart
should not be true also.
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
The following changes are done.
- Multi asic platform have 2 Loopback interfaces, Loopback0 and Loopback4096. IPinIP decap entries need to be added for both of them. Update the ipinip.json.j2 template to add decap entries for Loopback4096.
- Add corressponding unit test
For telemetry regression test we need gnmi client to be present on ptfdocker. Gnmi-server will be present on SONiC DuT. Further, we can access gnmi_get from ptfdocker inside pytest to verify gnmi server streaming data successfully or not.
When telemetry is in secure mode ,the monitor will have error log of the match string "--insecure". So I modify to be compatiable with insecure mode and secure mode.
Co-authored-by: Ubuntu <ubuntu@ip-10-5-1-21.ap-south-1.compute.internal>
The template is referenced relative to the script path and this could
results in errors in case script is run from root. Add explicit
path to the template file name.
Also, moving telemetry_var template to template dir.
And remove double quotes from around json dict.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
when portsyncd starts, it first enumerates all front panel ports
and marks them as old interfaces. Then, for new front panel ports
it checks if their indexes exist in previous sets. If yes, it will
treats them as old interfaces and ignore them.
The reason we have this check is because broadcom SAI only removes
front panel ports after sai switch init.
So, if portsyncd starts after orchagent, new interfaces could be
created before portsyncd and treated as old interface.
Signed-off-by: Guohan Lu <lguohan@gmail.com>
Copy proper fancontrol config file to the proper destination. Also some minor refactoring for code reuse to help prevent issues like this in the future.
Fixes a bug introduced by #4599
* Changes to add template support for copp.json.
This is needed so that we can install differnt type of
Traps based on Device Role (Tor/Leaf/Mgmt/etc...).
Initial use case is to install DHCP/DHCPv6 tarp only
for tor router.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Fixed based on review comments.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Fixed based on review comment.
Optimizing number of calls made to sonic-cfggen during service
start up as it adds to total system boot up time.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
**- Why I did it**
sonic-cfggen call is slow and it adds to system start up time
**- How I did it**
places all required variable into single template and called into sonic-cfggen using this template
**- How to verify it**
***-Test 1***
there is an average saving of .5 to 1 sec between old script and new script
```
root@str-s6000-acs-14:/# time ./orchagent_old.sh
/usr/bin/orchagent -d /var/log/swss -b 8192 -m f4:8e:38:16:bc:8d
real 0m3.546s
user 0m2.365s
sys 0m0.585s
root@str-s6000-acs-14:/# time ./orchagent_new.sh
/usr/bin/orchagent -d /var/log/swss -b 8192 -m f4:8e:38:16:bc:8d
real 0m2.058s
user 0m1.650s
sys 0m0.363s
```
***-Test 2***
Built an image with this change and orchagent is running with intended params:
```
admin@str-s6000-acs-14:~$ ps -ef | grep orchagent
root 2988 1901 1 02:09 pts/0 00:00:02 /usr/bin/orchagent -d /var/log/swss -b 8192 -m f4:8e:38:16:bc:8d
```
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
All new NAT conntrack entries are added to kernel with max entry timeout of 432000 and setting the same timeout during system warm reboot also
File "/usr/local/bin/sonic-cfggen", line 380, in <module>
main()
File "/usr/local/bin/sonic-cfggen", line 354, in main
print(template.render(data))
File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 1090, in render
self.environment.handle_exception()
File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 832, in handle_exception
reraise(*rewrite_traceback_stack(source=source))
File "<template>", line 1, in top-level template code
File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 471, in getattr
return getattr(obj, attribute)
jinja2.exceptions.UndefinedError: 'WARM_RESTART' is undefined
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* [mgmt docker] move pycryptodome installation to the end of the docker building
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* pin down the version to current: 3.9.8
* comment
sonic-cfggen call is slow and this is taking place in the SONiC
boot up process. The change uses templates to assemble all required
vars into single template file. With this change, telemetry now calls
once into sonic-cfggen.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>