[S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert.
#### Why I did it
On S6100, the serial-getty service some time can't auto-restart by systemd. So there is a monit unit to check serial-getty service status and restart it.
However, this monit will report false alert, because in most case when serial-getty not running, systemd can restart it successfully.
To avoid the false alert, improve the monitor to wait and re-check.
Steps to reproduce this issue:
1. User login to device via console, and keep the connection.
2. User login to device via SSH, check the serial-getty@ttyS1.service service, it's running.
3. Run 'monit reload' from SSH connection.
4. Check syslog 1 minutes later, there will be false alert: ' 'serial-getty' process is not running'
#### How I did it
Add check-getty.sh script to recheck again later when getty service not running.
And update monit unit to check serial-getty service status with this script to avoid false alert.
#### How to verify it
Pass all UT.
Manually check fixed code work correctly:
```
admin@***:~$ sudo systemctl stop serial-getty@ttyS1.service
admin@***:~$ sudo /usr/local/bin/check-getty.sh
admin@***:~$ echo $?
1
admin@***:~$ sudo systemctl status serial-getty@ttyS1.service
● serial-getty@ttyS1.service - Serial Getty on ttyS1
Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
Active: inactive (dead) since Tue 2023-03-28 07:15:21 UTC; 1min 13s ago
admin@***:~$ sudo /usr/local/bin/check-getty.sh
admin@***:~$ echo $?
0
admin@***:~$ sudo systemctl status serial-getty@ttyS1.service
● serial-getty@ttyS1.service - Serial Getty on ttyS1
Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
```
syslog:
```
Mar 28 07:10:37.597458 *** INFO systemd[1]: serial-getty@ttyS1.service: Succeeded.
Mar 28 07:12:43.010550 *** ERR monit[593]: 'serial-getty' status failed (1) -- no output
Mar 28 07:12:43.010744 *** INFO monit[593]: 'serial-getty' trying to restart
Mar 28 07:12:43.010846 *** INFO monit[593]: 'serial-getty' stop: '/bin/systemctl stop serial-getty@ttyS1.service'
Mar 28 07:12:43.132172 *** INFO monit[593]: 'serial-getty' start: '/bin/systemctl start serial-getty@ttyS1.service'
Mar 28 07:13:43.286276 *** INFO monit[593]: 'serial-getty' status succeeded (0) -- no output
```
#### Description for the changelog
[S6100] Improve S6100 serial-getty monitor.
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
Why I did it
Cherry-pick #14601, for code conflict.
Support to add SONiC OS Version in device info.
It will be used to display the version info in the SONiC command "show version". The version is used to do the FIPS certification. We do not do the FIPS certification on a specific release, but on the SONiC OS Version.
SONiC Software Version: SONiC.master-13812.218661-7d94c0c28
SONiC OS Version: 11
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
Work item tracking
Microsoft ADO (number only): 17894593
How I did it
How to verify it
Why I did it
[Submodule][202211] Advance sonic-restapi pointer
The branch 202012 has already updated to commit 47e4b53.
4f6f979 Fix the redis security issue CVE-2023-28858 and CVE-2023-28859 (#139)
47e4b53 Fix adv_pfx len for ipv6 (#135)
44121be Support ipv6 prefix lenght greater than 64 and check for adv_prefix (#134)
99c467d Add API support for adv prefix and custom monitoring (#133)
347684a Use github code scanning instead of LGTM (#132)
86543d0 Updates to route PATCH API (#129)
a1af82c Install libyang to azure pipeline (#128)
2007c4c Increase coverage threshold (#126)
Work item tracking
Microsoft ADO (number only): 17705422
How I did it
How to verify it
Why I did it
Fixes#14179
chassis-packet: missing arp entries for static routes causing high orchagent cpu usage
It is observed that some sonic-mgmt test case calls sonic-clear arp, which clears the static arp entries as well. Orchagent or arp_update process does not try to resolve the missing arp entries after clear.
How I did it
arp_update should resolve the missing arp/ndp static route
entries. Added code to check for missing entries and try ping if any
found to resolve it.
How to verify it
After boot or config reload, check ipv4 and ipv4 neigh entries to make sure all static route entries are present
manual validation:
Use sonic-clear arp and sonic-clear ndp to clear all neighbor entries
run arp_update
Check for neigh entries. All entries should be present.
Testing on T0 setup route/for test_static_route.py
The test set the STATIC_ROUTE entry in conifg db without ifname:
sonic-db-cli CONFIG_DB hmset 'STATIC_ROUTE|2.2.2.0/24' nexthop 192.168.0.18,192.168.0.25,192.168.0.23
"STATIC_ROUTE": {
"2.2.2.0/24": {
"nexthop": "192.168.0.18,192.168.0.25,192.168.0.23"
}
},
Validate that the arp_update gets the proper ARP_UPDATE_VARDS using arp_update_vars.j2 template from config db and does not crash:
{ "switch_type": "", "interface": "", "pc_interface" : "PortChannel101 PortChannel102 PortChannel103 PortChannel104 ", "vlan_sub_interface": "", "vlan" : "Vlan1000", "static_route_nexthops": "192.168.0.18 192.168.0.25 192.168.0.23 ", "static_route_ifnames": "" }
validate route/test_static_route.py testcase pass.
Why I did it
Optimize the version control for Debian packages.
Fix sonic-slave-buster/sources.list.amd64 not found display issue, need to generate the file before running the shell command to evaluate the sonic image tag.
When using the snapshot mirror, it is not necessary to update the version file based on the base image. It will reduce the version dependency issue, when an image is not run when freezing the version.
How I did it
Not to update the version file when snapshot mirror enabled.
How to verify it
Update sonic-swss-common submodule pointer to include the following:
* 6e4daf1 Revamp module build script to make it work for 5.15 on Ubuntu 20.04 (sonic-net/sonic-swss-common#720)
* 7f40cde Non recursive automake and Debian packaging changes (sonic-net/sonic-swss-common#700)
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
sonic-slave-stretch build failed for mmh3 version update to 3.10 on Mar 24.
How I did it
Enable reproducible build for vhdx image.
How to verify it
Update sonic-py-common, add missing dependency to redis-dump-load.
This is manually cherry-pick PR for https://github.com/sonic-net/sonic-buildimage/pull/14347
After 202211, the redis-dump-load been patched by sonic, so can't cherry-pick master branch PR to 202211 branch.
#### Why I did it
The script sonic_db_dump_load.py in sonic-py-common is depends on redis-dump-load, however the dependency is missing.
#### How I did it
Add redis-dump-load dependency.
#### How to verify it
Pass all E2E test case.
#### Description for the changelog
Update sonic-py-common, add missing dependency to redis-dump-load.
Why I did it
Fix some of vulnerability issue relative python packages #14269
Pillow: [CVE-2021-27921]
Wheel: [CVE-2022-40898]
lxml: [CVE-2022-2309]
How I did it
How to verify it
Why I did it
SONiC currently does not identify 'EdgeZoneAggregator' neighbor. As a result, the buffer profile attached to those interfaces uses the default cable length which could cause ingress packet drops due to insufficient headroom. Hence, there is a need to update the buffer templates to identify such neighbors and assign the same cable length as used by the T1.
How I did it
Modified the buffer template to identify EdgeZoneAggregator as a neighbor device type and assign it the same cable length as a T1/leaf router.
How to verify it
Unit tests pass, and manually checked on a 7260 to see the changes take effect.
Signed-off-by: dojha <devojha@microsoft.com>
Provide platform-components.json for Clearwater2 and Wolverine
These files are needed for fwutil platform sonic-mgmt tests to pass.
Fix PikeZ platform_components.json
Co-authored-by: Patrick MacArthur <pmacarthur@arista.com>
Co-authored-by: Andy Wong <andywong@arista.com>
- Why I did it
Fixes#14236
When a redis event quickly gets outdated during port breakout, error logs like this are seen
Mar 8 01:43:26.011724 r-leopard-56 INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet64': {'admin_status': 'down'}, 'Ethernet68': {'admin_status': 'down'}}}
Mar 8 01:43:26.012565 r-leopard-56 INFO ConfigMgmt: Writing in Config DB
Mar 8 01:43:26.013468 r-leopard-56 INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet64': None, 'Ethernet68': None}, 'INTERFACE': None}
Mar 8 01:43:26.018095 r-leopard-56 NOTICE swss#portmgrd: :- doTask: Configure Ethernet64 admin status to down
Mar 8 01:43:26.018309 r-leopard-56 NOTICE swss#portmgrd: :- doTask: Delete Port: Ethernet64
Mar 8 01:43:26.018641 r-leopard-56 NOTICE lldp#lldpmgrd[32]: :- pops: Miss table key PORT_TABLE:Ethernet64, possibly outdated
Mar 8 01:43:26.018654 r-leopard-56 ERR lldp#lldpmgrd[32]: unknown operation ''
- How I did it
Only log the error when the op is not empty and not one of ("SET" & "DEL" )
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Fixes#11873.
#### Why I did it
When loading from minigraph, for port channels, don't create the members@ array in config_db in the PORTCHANNEL table. This is no longer needed or used.
In addition, when adding a port channel member from the CLI, that member doesn't get added into the members@ array, resulting in a bit of inconsistency. This gets rid of that inconsistency.
Why I did it
Dhcpmon had incorrect RX count for server side packets. It does not raise any false alarms, but could miss catching server side packet count mismatch between snapshot and current counter.
Add debug mode which prints counter to syslog
How I did it
Due to dualtor inbound filter requirement, there are currently two filters, each for listening to rx / tx packets.
Originally, we opened up an rx/tx socket for each interface specified, which causes duplicate socket. Now we initialize the sockets only once. Both sockets are not binded to an interface, and we use vlan to interface mapping to filter packets. For inbound uplinks, we use a portchannel to interface mapping.
Previous dhcpmon counter before dual tor change:
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
[ eth0- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ eth0- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ PortChannel104- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel103- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel102- Current rx/tx] Discover: 0/ 2, Offer: 1/ 0, Request: 0/ 6, ACK: 1/ 0
[ PortChannel101- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ Vlan1000- Current rx/tx] Discover: 1/ 0, Offer: 0/ 1, Request: 3/ 0, ACK: 0/ 1
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
Dhcpmon counter after this PR:
[ PortChannel104- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel103- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel102- Current rx/tx] Discover: 0/ 2, Offer: 1/ 0, Request: 0/ 6, ACK: 1/ 0
[ PortChannel101- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ Vlan1000- Current rx/tx] Discover: 1/ 0, Offer: 0/ 1, Request: 3/ 0, ACK: 0/ 1
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
How to verify it
Ran dhcp relay test to send all four packets in singles and batches on both single ToR and dual ToR. Counter was as expected.