This PR limited the number of calls to sonic-cfggen to one call
per iteration instead of current 3 calls per iteration.
The PR also installs jq on host for future scripts if needed.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Arp update process was not being started due to an issue with
the directory name having an extra 'd' in supervisor as in
'/etc/supervisord/conf.d/arp_update.conf'.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
* Support for multi-asic platform for swssloglevel command
admin@str-acs-1:~$ swssloglevel
Usage: /usr/bin/swssloglevel -n [0 to 3] [OPTION]...
* Update to use the env file to get the PLATFORM string.
Printing both snapshot and current counter sets will make it easier to pinpoint
which message type(s) is/are not being relayed. This PR prints both counter sets.
Also, this PR defines gnu11 as a C standard to compile with in order to avoid
making changes when porting to 201811 branch.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
When BGP routes are missing, DHCP packets get relayed over mgmt
interface. This results in dhcpmon alerting that DHCP packets are
not being relayed. This is PR include mgmt interface as uplink
device, and so, if DHCP packet gets relayed over mgmt interface,
regular dhcpmon alert will not be issues. Instead, dhcpmon will
check the mgmt interface counts and issue a separate alert regarding
packets travelling through mgmt network.
In addition, this PR includes the following enhancements:
1. Add SIGUSR1 handler that prints out current packet counts
2. Increase alert grace window to 3 minutes from currently 2 minutes
3. Time is now computed more accurately
4. Print vlan name before counters
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:
```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```
This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.
Resolves https://github.com/Azure/sonic-buildimage/issues/5241
* [redis] Use redis-server and redis-tools in blob storage to prevent
upstream link broken
* Use curl instead of wget
* Explicitly install dependencies
https://github.com/Azure/sonic-buildimage/issues/5255
Root Cause: Waiting on Restore count != 0 can lead to race condition
between orchagent process and swssconfig.sh.
Ideally check of Restore count != 0 is not needed as the State DB
cannot be flushed as if it was flushed then Warm Restart or swss-restart
should not be true also.
Remove radvd Makefile and patch, change docker-router-advertiser Dockerfile template to simply install the vanilla radvd package using apt-get.
- In PR https://github.com/Azure/sonic-buildimage/pull/2795, we started building radvd from source and patching it to prevent it from erroring out when advertising an MTU of 9100 which was greater than the MTU size configured on the bridge interface (1500), which was due to a limitation in the 4.9 Linux kernel.
- Master branch is now using Linux kernel 4.19. As of 4.18, the kernel supports setting a bridge MTU to a value > 1500.
- PR https://github.com/Azure/sonic-swss/pull/1393 modified vlanmgrd to take advantage of this and now configures the MTU of bridge interfaces in SONiC to the proper size of 9100. Therefore, we no longer need to patch radvd. Since we no longer need to patch radvd, we no longer need to build it from source, so we can save build time by going back to simply installing the vanilla radvd Debian package in the router-advertiser container.
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
The following changes are done.
- Multi asic platform have 2 Loopback interfaces, Loopback0 and Loopback4096. IPinIP decap entries need to be added for both of them. Update the ipinip.json.j2 template to add decap entries for Loopback4096.
- Add corressponding unit test
Add a master switch so that the sync/async mode can be configured.
Example usage of the switch:
1. Configure mode while building an image
`make ENABLE_SYNCHRONOUS_MODE=y <target>`
2. Configure when the device is running
Change CONFIG_DB with `sonic-cfggen -a '{"DEVICE_METADATA":{"localhost": {"synchronous_mode": "enable"}}}' --write-to-db`
Restart swss with `systemctl restart swss`
**- Why I did it**
PR https://github.com/Azure/sonic-buildimage/pull/4599 introduced two bugs in the startup of the router advertiser container:
1. References to the `wait_for_intf.sh` script were changed to `wait_for_link.sh`, but the actual script was not renamed
2. The `ipv6_found` Jinja2 variable added to the supervisor config file goes out of scope before it is read.
**- How I did it**
1. Rename the `wait_for_intf.sh` script to `wait_for_link.sh`
2. Use the Jinja2 "namespace" construct to fix the scope issue
**- How to verify it**
Ensure all processes in the radv container start properly under the correct conditions (i.e., whether or not there is at least one VLAN with an IPv6 address assigned).
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to one call during startup when starting radv service.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to one call during startup when starting swss service.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to two calls during startup when starting frr service.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to one call during startup when starting dhcp-relay
service.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to once calle during snmp startup
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
As part of migrating all Python-based package installers to wheel format rather than Debian packages. Also to allow for easily building a Python 3 version of the package in the near future. ledd and psud were converted in earlier PRs. This PR converts the remainder:
- pcied
- syseepromd
- thermalctld
- xcvrd
As part of migrating all Python-based package installers to wheel format rather than Debian packages. Also to allow for easily building a Python 3 version of the package in the near future.
As part of migrating all Python-based package installers to wheel format rather than Debian packages. Also to allow for easily building a Python 3 version of the package in the near future.
- Also remove some references to sonic-daemon-base which I previously missed and add missing sonic-py-common dependency for sonic-pcied.
* [platform] Add Support For Environment Variable
This PR adds the ability to read environment file from /etc/sonic.
the file contains immutable SONiC config attributes such as platform,
hwsku, version, device_type. The aim is to minimize calls being made
into sonic-cfggen during boot time.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
* Changes to add template support for copp.json.
This is needed so that we can install differnt type of
Traps based on Device Role (Tor/Leaf/Mgmt/etc...).
Initial use case is to install DHCP/DHCPv6 tarp only
for tor router.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Fixed based on review comments.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Fixed based on review comment.
Verify that /etc/apt/sources.list points to buster using docker exec bgp cat /etc/apt/sources.list
BGP neighborship is established.
root@sonic:~# show ip bgp summary
IPv4 Unicast Summary:
BGP router identifier 10.1.0.1, local AS number 65100 vrf-id 0
BGP table version 1
RIB entries 1, using 184 bytes of memory
Peers 1, using 20 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
6.1.1.1 4 100 96 96 0 0 0 01:32:04 0
Total number of neighbors 1
root@sonic:~#
Signed-off-by: Joyas Joseph <joyas_joseph@dell.com>
Copy proper fancontrol config file to the proper destination. Also some minor refactoring for code reuse to help prevent issues like this in the future.
Fixes a bug introduced by #4599
fixes https://github.com/Azure/sonic-buildimage/issues/5026
Explanation:
In the log from the issue I found:
```
I see following in the log
Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed
```
Analyzing source code I found that the error message could be issues only when `zclient_send_rnh()` return less than 0.
```
ret = zclient_send_rnh(zclient, command, p, exact_match,
bnc->bgp->vrf_id);
/* TBD: handle the failure */
if (ret < 0)
flog_warn(EC_BGP_ZEBRA_SEND,
"sendmsg_nexthop: zclient_send_message() failed");
```
I checked [zclient_send_rnh()](88351c8f6d/lib/zclient.c (L654)) and found that this function will return the exit code which the function gets from [zclient_send_message()](88351c8f6d/lib/zclient.c (L266)) But the latter function could return not 0 in two cases:
1. bgpd didn’t connect to the zclient socket yet [code](88351c8f6d/lib/zclient.c (L269))
2. The socket was closed. But in this case we would receive the error message in the log. (And I can find the message in the log when we reboot sonic) [code](88351c8f6d/lib/zclient.c (L277))
Also I see from the logs that client connection was set later we had the issue in bgpd.
Bgpd.log
```
Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed
```
Vs
Zebra.log
```
Jul 22 21:13:12.713249 vlab-01 NOTICE bgp#zebra[48]: client 25 says hello and bids fair to announce only static routes vrf=0
Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 30 says hello and bids fair to announce only bgp routes vrf=0
Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 33 says hello and bids fair to announce only vnc routes vrf=0
```
So in our case we should start zebra first. Wait until it is started and then start bgpd and other daemons.
**- How I did it**
I changed a graph to start daemons in the following order:
1. First start zebra
2. Then starts staticd and bgpd
3. Then starts vtysh -b and bgpeoi after bgpd is started.
Optimizing number of calls made to sonic-cfggen during service
start up as it adds to total system boot up time.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
**- Why I did it**
sonic-cfggen call is slow and it adds to system start up time
**- How I did it**
places all required variable into single template and called into sonic-cfggen using this template
**- How to verify it**
***-Test 1***
there is an average saving of .5 to 1 sec between old script and new script
```
root@str-s6000-acs-14:/# time ./orchagent_old.sh
/usr/bin/orchagent -d /var/log/swss -b 8192 -m f4:8e:38:16:bc:8d
real 0m3.546s
user 0m2.365s
sys 0m0.585s
root@str-s6000-acs-14:/# time ./orchagent_new.sh
/usr/bin/orchagent -d /var/log/swss -b 8192 -m f4:8e:38:16:bc:8d
real 0m2.058s
user 0m1.650s
sys 0m0.363s
```
***-Test 2***
Built an image with this change and orchagent is running with intended params:
```
admin@str-s6000-acs-14:~$ ps -ef | grep orchagent
root 2988 1901 1 02:09 pts/0 00:00:02 /usr/bin/orchagent -d /var/log/swss -b 8192 -m f4:8e:38:16:bc:8d
```
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Optimizing number of calls made to sonic-cfggen during service
start up as it adds to total system boot up time.
***-Test 1***
there is an average saving of 1 to 1.5 sec between old script and new script
```
root@str-s6000-acs-14:/# time /usr/bin/rest-server-old.sh
Generating temporary TLS server certificate ...
2020/07/09 19:03:33 wrote cert.pem
2020/07/09 19:03:33 wrote key.pem
REST_SERVER_ARGS = -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem
/usr/sbin/rest_server -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem
real 0m8.790s
user 0m7.993s
sys 0m0.584s
root@str-s6000-acs-14:/# time /usr/bin/rest-server-new.sh
Generating temporary TLS server certificate ...
2020/07/09 19:03:45 wrote cert.pem
2020/07/09 19:03:45 wrote key.pem
REST_SERVER_ARGS = -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem
/usr/sbin/rest_server -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem
real 0m6.940s
user 0m5.670s
sys 0m0.386s
```
***-Test 2***
Built an image with this change and rest server is running with params as described in test 1 above
```
admin@str-s6000-acs-14:~$ ps -ef | grep rest_server
root 3301 2866 2 02:09 pts/0 00:00:10 /usr/sbin/rest_server -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem
```
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
For telemetry regression test we need gnmi client to be present on ptfdocker. Gnmi-server will be present on SONiC DuT. Further, we can access gnmi_get from ptfdocker inside pytest to verify gnmi server streaming data successfully or not.
The template is referenced relative to the script path and this could
results in errors in case script is run from root. Add explicit
path to the template file name.
Also, moving telemetry_var template to template dir.
And remove double quotes from around json dict.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
* [mgmt docker] move pycryptodome installation to the end of the docker building
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* pin down the version to current: 3.9.8
* comment
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
All new NAT conntrack entries are added to kernel with max entry timeout of 432000 and setting the same timeout during system warm reboot also
sonic-cfggen call is slow and this is taking place in the SONiC
boot up process. The change uses templates to assemble all required
vars into single template file. With this change, telemetry now calls
once into sonic-cfggen.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>