Commit Graph

1283 Commits

Author SHA1 Message Date
Hua Liu
f4b1eb0a5b
Fix IPV6 forced-mgmt-route not work issue (#17299) (#18045)
Fix IPV6 forced-mgmt-route not work issue

Why I did it
IPV6 forced-mgmt-route not work

When add a IPV6 route, should use 'ip -6 rule add pref 32764 address' command, but currently in the template the '-6' parameter are missing, so the IPV6 route been add to IPV4 route table.

Also this PR depends on #17281 , which will fix the IPV6 'default' route table missing in IPV6 route lookup issue. 

Microsoft ADO (number only):24719238
2024-02-07 06:50:57 -08:00
mssonicbld
5cd18eeda7
[ci/build]: Upgrade SONiC package versions (#17956) 2024-02-02 08:15:51 -08:00
Stepan Blyshchak
1672ce81fc [config-topology] use cached variables (#17343)
- Why I did it
Improve  boot performance mostly needed for fast and warmboot

- How I did it
Use cached variable.

- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2024-02-02 18:32:29 +08:00
Hua Liu
009b0dd7ec Change orchagent stuck message from ERR to WARNING (#17872)
Change orchagent stuck message from ERR to WARNING

#### Why I did it
During switch initialization, sometime Orchagent will busy for more than 40seconds and will trigger process stuck workdog error.
To improve this issue, change watchdog error message to warning message.

##### Work item tracking
- Microsoft ADO: 26517622

#### How I did it
Change orchagent stuck message from ERR to WARNING.

#### How to verify it
Pass all UT.

### Description for the changelog
Change orchagent stuck message from ERR to WARNING.
2024-02-02 18:32:16 +08:00
Zain Budhwani
fe07450a26 Disable eventd and rsyslog plugin in slim images (#17905)
### Why I did it

Disable eventd at buildtime for slim images

##### Work item tracking
- Microsoft ADO **(number only)**:26386286

#### How I did it

Add flags for disabling eventd and only copy rsyslog conf files when eventd is included and not slim image

#### How to verify it

Manual testing
2024-02-02 16:32:26 +08:00
Liping Xu
1e4dcbc75d disable restapi for leafRouter in slim image (#17713)
Why I did it
For some devices with small memory, after upgrading to the latest image, the available memory is not enough.

Work item tracking
Microsoft ADO (number only):
26324242
How I did it
Disable restapi feature for LeafRouter which with slim image.

How to verify it
verified on 7050qx T1 (slim image), restapi disabled
verified on 7050qx T0 (slim image), restapi enabled
verified on 7260 T1 (normal image), restapi enabled
2024-02-02 14:32:57 +08:00
Hua Liu
b84e3f9e8a Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue. (#17281)
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.

#### Why I did it
When device set with IPV6 TACACS server address, and shutdown all BGP, device can't connect to TACACS server via management interface.

After investigation, I found the IPV6 'default' route table does not add to route lookup:

admin@vlab-01:~$ ip -6 rule list
1001:   from all lookup local
32765:  from fec0::ffff:afa:1 lookup default
32766:  from all lookup main
admin@vlab-01:~$

As compare:
admin@vlab-01:~$ ip -4 rule list
1001:   from all lookup local
32764:  from all to 172.17.0.1/24 lookup default
32765:  from 10.250.0.101 lookup default
32766:  from all lookup main
32767:  from all lookup default <== 'default' route table exist in IPV4 route lookup

Issue fix by add 'default' route table to route lookup with following command:
admin@vlab-01:~$ sudo ip -6 rule add pref 32767 lookup default
admin@vlab-01:~$ ip -6 rule list
1001:   from all lookup local
32765:  from fec0::ffff:afa:1 lookup default
32766:  from all lookup main
32767:  from all lookup default <== 'default' route table been added to IPV6 route lookup
admin@vlab-01:~$

##### Work item tracking
- Microsoft ADO: 25798732

#### How I did it
When management interface using 'default' route table, add 'default' route table to IPV6 route lookup.

#### How to verify it
Pass all UT.
Add new UT to cover this change.
Manually verify issue fixed:

### Tested branch (Please provide the tested image version)

- [x]  master-17281.417570-2133d58fa

#### Description for the changelog
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.
2024-02-02 14:32:50 +08:00
Nazarii Hnydyn
48f7613f95 [swss/syncd]: Remove dependency on interfaces-config.service (#17739)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
2024-02-01 09:34:16 +08:00
Stepan Blyshchak
23190ad000 [config-chassisdb] use cached variables (#17342)
- Why I did it
Improve boot performance mostly needed for fast and warmboot

- How I did it
Use cached variable.

- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2024-02-01 09:34:09 +08:00
Oleksandr Ivantsiv
3ba603960b [dns] Do not apply dynamic DNS configuration when MGMT interface has static IP address. (#17769)
### Why I did it
Fix the issue detected by[ TestStaticMgmtPortIP::test_dynamic_dns_not_working_when_static_ip_configured ](https://github.com/sonic-net/sonic-mgmt/blob/master/tests/dns/static_dns/test_static_dns.py#L105C9-L105C63) test.

### How I did it
Query MGMT interface configuration. Do not apply dynamic DNS configuration when MGMT interface has static IP address.

#### How to verify it
Run `tests/dns/static_dns/test_static_dns.py` sonic-mgmt tests.
2024-02-01 09:34:02 +08:00
Prince George
6b8549c3bb Fix the fsck script that does filesystem repair (#17424)
Fix the fsck check which is not working. Potentially fixes #16938
Modified fsck script to run on the ext4.fsck on the appropriate disk where SONiC resides

Microsoft ADO: 26098631
2024-01-30 06:32:12 +08:00
Kevin Wang
2b21e64ffc [qos] change the template keyword from Compute-AI to ComputeAI (#17902)
Why I did it
Align the keywords to make qos configuration take effect

Work item tracking
Microsoft ADO (number only):
How I did it
Change the keyword to ComputeAI

How to verify it
reload minigraph and check the qos configuration
2024-01-29 16:32:34 +08:00
mssonicbld
4166d83f04
[ci/build]: Upgrade SONiC package versions (#17917) 2024-01-27 18:39:42 -08:00
mssonicbld
9f7166b5d2
Change tcp port range to support telemetry and gnmi (#17907) (#17916)
* Reserve tcp port for telemetry and gnmi

* Use ip_local_port_range instead

* Fix sysctl config

Co-authored-by: ganglv <88995770+ganglyu@users.noreply.github.com>
2024-01-26 13:27:30 -08:00
Lawrence Lee
bd63fff758 add timeout to ping6 command (#17729)
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2024-01-11 10:41:00 +08:00
Junchao-Mellanox
4b6feaa69c Optimize syslog rate limit feature for fast and warm boot (#17458)
- Why I did it
Optimize syslog rate limit feature for fast and warm boot

- How I did it
Optimize redis start time
Don't render rsyslog.conf in container startup script
Disable containercfgd by default. There is a new CLI to enable it (in another PR)

- How to verify it
Manual test
Regression test
2024-01-10 14:35:30 +08:00
mssonicbld
ac4f6fcbc2
[docker_image_ctl.j2]: swss docker initialization improvements (#17628) (#17680) 2024-01-05 04:39:16 +08:00
mssonicbld
c5473c1d8b
Update backend_acl.py to specify ACL table name (#17553) (#17668) 2024-01-04 10:45:26 +08:00
mssonicbld
48885b6ac9
[image_config]: Update DHCP rate-limit for mgmt TOR devices (#17630) (#17655) 2024-01-03 17:36:12 +08:00
Yevhen Fastiuk
f78cb9c55c
[202311][cherry-pick][NTP] Add NTP extended configuration (#17487)
* Add NTP YANG model

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Extend NTP config generation mechanism

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Add NTP YANG nodel tests

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Add test for NTP Jinja templates

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Add ntpdate package

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Fix 'bad' when auth disabled

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* [NTP] Changed owner for ntp keys config file to root and remove read access for other.

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Fix NTP warnings after restarting the service

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Add ability to encrypt/decrypt NTP keys

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Update Configuration reference

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Fix NTP configuration template

* Align the description for setting interface
* Fix the usage of scoped variable

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Fix YANG model description and tests

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Align NTP test according to fixed condition

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Allow eth0 to be as source ifc without defining it

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

* Update sample config with NTP config

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>

---------

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
2023-12-21 09:45:29 -08:00
Ying Xie
9e94c3689a
[202311] set sonic release value (#17582)
Why I did it
Each release branch needs to have release number set.

Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
This PR test.
2023-12-21 13:26:53 +08:00
Junhua Zhai
3d7459ccfc
[gbsyncd] Graceful shutdown of syncd process in container gbsyncd (#16812) (#17563)
Fix #16608. Need to gracefully shutdown syncd/gbsyncd individually.
2023-12-20 09:23:14 -08:00
mssonicbld
2e8c2eba14
Revert "[swss/syncd] remove dependency on interfaces-config.service (#13084) (#14341)" (#15094) (#17367) (#17447) 2023-12-09 10:22:55 +08:00
Kebo Liu
2528b70630 [Mellanox] Add special rsyslog filter for MSN2410 platform (#17365)
- Why I did it
Mellanox MSN2410 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely

- How I did it
Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf

- How to verify it
run regression on the MSN2410 platform to make the error log will not be printed to the syslog.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-12-04 22:14:03 +00:00
Lawrence Lee
7d308e340a [arp_update]: Flush neighbors with incorrect MAC info (#17238)
[arp_update]: Flush MAC mismatch neighbors

- Check for MAC mismatch between neighbor entries in the kernel and APPL_DB
- Flush any entries with a mismatch
2023-12-04 22:14:03 +00:00
Xincun Li
b78e3a0d20 Ensure that 'logrotate-config.service' is set as a dependency to start before 'logrotate.service'. (#17312)
* Ensure that 'logrotate-config.service' is set as a dependency to start before 'logrotate.service'.
2023-12-04 22:14:03 +00:00
Vivek
5c36732f3b [lldp] Clean up service start logic owing to port init start optimization (#17268)
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
2023-12-04 22:14:02 +00:00
prabhataravind
26ade35fdf [image_config]: Update DHCP rate-limit (#17132)
Change DHCP rate limit in SONiC copp configuration to 100 PPS as this is
necessary to ensure that DHCP flood does not cause LACP/BGP flaps in all
scenarios

This is an extension to the change in image_config: copp: Enable rate limiting 
for bgp, lacp, dhcp, lldp, macsec and udld #14859 and sonic-mgmt change in 
[tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199

Why I did it
300 PPS is not sufficient to prevent LACP/BGP flaps in all cases. 100 PPS seems to
provide better resiliency against DHCP traffic flood to CPU.

Microsoft ADO 25776614:

Send DHCP broadcast packets to DUT and verify that they are trapped to CPU at 100 PPS.

Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
2023-12-04 22:14:02 +00:00
abdosi
4a7aa2634f
[chassis] Support advertisement of Loopback0 of all LC's across all e-BGP peers in TSA mode (#16714)
What I did:
In Chassis TSA mode Loopback0 Ip's of each LC's should be advertise through e-BGP peers of each remote LC's

How I did:

- Route-map policy to Advertise own/self Loopback IP to other internal iBGP peers with a community internal_community as define in constants.yml
- Route-map policy to match on above internal_community when route is received from internal iBGP peers and set a internal tag as define in constants.yml and also delete the internal_community so we don't send to any of e-BGP peers
- In TSA new route-map match on above internal tag and permit the route (Loopback0 IP's of remote LC's) and set the community to traffic_shift_community.
- In TSB delete the above new route-map.

How I verify:

Manual Verification

UT updated.
sonic-mgmt PR: sonic-net/sonic-mgmt#10239


Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-11-20 09:42:02 -08:00
Ze Gan
9f08f88a0d
[dpu]: Add DPU database service (#17161)
Sub PRs:

sonic-net/sonic-host-services#84
#17191

Why I did it
According to the design, the database instances of DPU will be kept in the NPU host.

Microsoft ADO (number only): 25072889

How I did it
To follow the multiple ASIC design, I assume a new platform environment variable NUM_DPU will be defined in the /usr/share/sonic/device/$PLATFORM/platform_env.conf. Based on this number, NPU host will launch a corresponding number of instances for the DPU database.

Signed-off-by: Ze Gan <ganze718@gmail.com>
2023-11-17 09:10:03 -08:00
ganglv
c71fb3a30f
Share image for gnmi and telemetry (#16863)
Why I did it
Share docker image to support gnmi container and telemetry container

Work item tracking
Microsoft ADO 25423918:
How I did it
Create telemetry image from gnmi docker image.
Enable gnmi container and disable telemetry container by default.

How to verify it
Run end to end test.
2023-11-08 08:54:36 +08:00
prabhataravind
7e49530459
[copp]: Enable rate limiting for bgp, lacp, dhcp, lldp, macsec and udld (#14859)
Why I did it
It was observed that a flood of DHCP packets without rate-limiting can cause BGP flaps or lacp keepalive losses.
This change attempts to prevent or reduce such BGP flaps by enabling appropriate rate-limiting in SONiC for all traffic types.

Work item tracking
Microsoft ADO 17964421:

How I did it
Set a reasonable CIR/CBS value of 300 for queue4_group3 (dhcp, lldp, macsec) and 6000 for queue4_group1.
The value 300 was arrived at after testing with dhcp flooding using ptf (using multiple threads). Throttling at this rate was necessary to ensure that dhcp flooding does not cause BGP flaps.

How to verify it
Verified with this script running from ptf, that BGP flaps don't happen when CBS/CIR is set at 300 for queue4_group3.

 import threading
 from scapy.all import *
 
 def send_dhcp_discover(intf):
     dhcp_discover = Ether(dst='ff:ff:ff:ff:ff:ff',src=RandMAC()) \
                         /IP(src='1.1.1.1',dst='255.255.255.255') \
                         /UDP(sport=68,dport=67) \
                         /DHCP(options=[('message-type','discover'),('end')])
     sendp(dhcp_discover,count=100000,iface=intf)
 
 
 if __name__ == "__main__":
     t1 = threading.Thread(target=send_dhcp_discover, args=("eth1",))
     t2 = threading.Thread(target=send_dhcp_discover, args=("eth2",))
     t1.start()
     t2.start()
     t1.join()
     t2.join()

Verified on Arista-7260CX3-D108C8 running 202012 that the copp rule for queue4_group1 and queue4_group3 do NOT affect BGP packets. To verify this using PTF, the copp rules were modified to set the "CBS" and "CIR" for queue4_group1 and queue4_group3 at 600pps and 50k packets each of "BGP open" and "DHCP Discover" were simultaneously sent from the same PTF port to the DUT. It was verified using "show c cpu" that packets are hitting the cpu queue at 1200 pps (double the configured CIR/CBS for these packet types). This helped conclude that throttling rate is per trap (or packet type) and not per queue.

Verified with updated sonic-mgmt tests ([tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199) on broadcom and mellanox platforms that these traffic types are rate-limited.

Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
2023-10-25 10:49:24 -07:00
Kebo Liu
31451295d5
Add special rsyslog filter for MSN2700 platform (#16684)
- Why I did it
Mellanox MSN2700 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely.

- How I did it
Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf

- How to verify it
run regression on the MSN2700 platform to make the error log will not be printed to the syslog.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-10-24 17:54:44 +03:00
Samuel Angebault
e4a497183a
Add build option to reduce final image size (#16729)
* Reduce SONiC image filesystem size

Add a build option to reduce the image size.
The image reduction process is affecting the builds in 2 ways:
 - change some packages that are installed in the rootfs
 - apply a rootfs reduction script

The script itself will perform a few steps:
 - remove file duplication by leveraging hardlinks
   - under /usr/share/sonic since the symlinks under the device folder are lost during the build.
   - under /var/lib/docker since the files there will only be mounted ro
 - remove some extra files (man, docs, licenses, ...)
 - some image specific space reduction (only for aboot images currently)

The script can later be improved but for now it's reducing the rootfs
size by ~30%.

* restore fully featured vim package
2023-10-24 10:01:58 +08:00
Samuel Angebault
d760fb928c
Disable CPU C-States other than C1 (#16703)
Why I did it
Networking devices need to be responsive. Such responsiveness is harmed when the CPU change state.
There is a latency penalty when a CPU is idle (e.g C2) and need to exit this state to come back to C1 state.
To prevent this from happening the CPU should be forced to remain in C1 state.

How I did it
Generalize the cstate forcing to C1 to all Arista products.
This is done by adding processor.max_cstate=1 to the kernel cmdline for all CPUs.
Additionally Intel CPUs also need intel_idle.max_cstate=0 to fallback to the acpi_idle driver.

How to verify it
Check that processor.max_cstate=1 is present on the cmdline for AMD CPUs
Check that both processor.max_cstate=1 and intel_idle.max_cstate=0 are present on the cmdline for Intel CPUs
2023-10-13 20:24:39 -07:00
Longxiang Lyu
072eaed2e3
[snmp] Check intfmgrd running before start (#16588)
Add pre start check to ensure intfmgrd is running.
The check will run for 20 seconds at most.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
2023-10-13 16:00:51 -07:00
Saikrishna Arcot
469aed2cf7
[baseimage]: Update openssh to 1:8.4p1-5+deb11u2 (#16826)
Openssh in Debian Bullseye has been updated to 1:8.4p1-5+deb11u2 to fix CVE-2023-38408. 
Since we're building openssh with some patches, we need to update our version as well.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-10-11 10:42:20 -07:00
Vadym Hlushko
3bd396043e
[buffers] Add 'create_only_config_db_buffers.json' file for the Mellanox devices (not MSFT SKU) (#16233)
* [buffers] Add create_only_config_db_buffers.json for MLNX devices (not MSFT SKU), inject it at the start of the swss docker

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

* [buffers] Align the sonic-device_metadata.yang

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

---------

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-10-03 08:35:57 -07:00
lixiaoyuner
bca2ce25ef
[k8master]: Install nc cmd for k8s master network issue debug (#16745) 2023-09-30 01:16:51 -07:00
Vaibhav Hemant Dixit
07f8507911
[fast-reboot] Fix regression: set FAST_REBOOT state_db flag to support fast-reboot from older images (#16733)
Why I did it
Fix: #16699

Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR #15685

The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
2023-09-28 09:37:21 -07:00
mssonicbld
4768b76610 [ci/build]: Upgrade SONiC package versions 2023-09-26 12:32:31 +08:00
Yevhen Fastiuk
52f6dd65a3
Improve remote fetch (#12795)
### Why I did it
To fix those errors:
One:
```
Connecting to urm.nvidia.com (urm.nvidia.com)|*.*.*.*|:443... connected.
GnuTLS: Error in the pull function.
Unable to establish SSL connection.
Error 4
make[1]: Leaving directory '/sonic/src/smartmontools'
[ target/debs/bullseye/smartmontools_6.6-1_amd64.deb ]
```
Second:
```
Get:90 https://debian-mirror-url buster/main amd64 librrd-dev amd64 1.7.1-2 [284 kB]
Get:91 https://debian-mirror-url buster/main amd64 psmisc amd64 23.2-1+deb10u1 [126 kB]
Get:92 https://debian-mirror-url buster/main amd64 python-smbus amd64 4.1-1 [12.2 kB]
Get:93 https://debian-mirror-url buster/main amd64 python3.7-dev amd64 3.7.3-2+deb10u3 [510 kB]
Get:94 https://debian-mirror-url buster/main amd64 python3-dev amd64 3.7.3-1 [1264 B]
Get:95 https://debian-mirror-url buster/main amd64 python3-smbus amd64 4.1-1 [12.5 kB]
Get:96 https://debian-mirror-url buster/main amd64 rrdtool amd64 1.7.1-2 [485 kB]
Fetched 122 MB in 12s (9976 kB/s)
E: Failed to fetch https://debian-mirror-url/pool/main/p/python-defaults/python2-minimal_2.7.16-1_amd64.deb  500  Internal Server Error [IP: *.*.*.* 443]
E: Failed to fetch https://debian-mirror-url/pool/main/f/fontconfig/fontconfig-config_2.13.1-2_all.deb  500  Internal Server Error [IP: *.*.*.* 443]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
The command '/bin/sh -c apt-get update &&       apt-get install -y          build-essential         python3-dev             ipmitool                librrd8                 librrd-dev              rrdtool                 python-smbus            python3-smbus           dmidecode               i2c-tools               psmisc                  libpci3' returned a non-zero code: 100
[ target/docker-platform-monitor.gz ]
Error 1
```

#### How I did it
Add retry mechanism to apt, wget, and curl hooks
2023-09-23 18:07:04 -07:00
abdosi
2ce04ab91d
[chassisd]: Add alternate to the bridge interface created on chassis supervisor. (#16505)
Add alternate name eth1-midplane to Linux bridge br1 created on supervisor on some chassis platforms.
See description here: #16504

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-09-23 00:27:21 -07:00
Hua Liu
11f5a75425
[tacacs]: Fix tcpdump report error when tacacs enabled (#16372)
Fix tcpdump report error when tacacs enabled.

Why I did it
Fix tcpdump report error when tacacs enabled:
Sep 1 09:25:18.189395 vlab-01 ERR tcpdump: nss_tacplus: /etc/tacplus_nss.conf fopen failed
Sep 1 09:25:18.189606 vlab-01 ERR tcpdump: nss_tacplus: bad config or server line for nss_tacplus

This is because debian add a patch create AppArmor profile for resource access control. The profile need update to allow tcpdump access /etc/tacplus_nss.conf.

Work item tracking
Microsoft ADO: 17667308

How I did it
Modify tcpdump AppArmor profile, add new line to allow tcpdump access TACACS config file:

/etc/tacplus_nss.conf r,
2023-09-23 00:07:53 -07:00
mssonicbld
a8af76f108 [ci/build]: Upgrade SONiC package versions 2023-09-22 12:32:57 +08:00
Zain Budhwani
382d68fe42
Move syncd events to syncd.conf (#15950)
### Why I did it

syncd events should have tag sonic-events-syncd, not sonic-events-host. Created a new conf file which will have syncd events

##### Work item tracking
- Microsoft ADO **(number only)**:17747466

#### How I did it

Code change

#### How to verify it

Pipeline
2023-09-19 17:29:49 -07:00
vdahiya12
45a852233b
[pmon] update gRPC version to 1.57.0 (#16257)
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2023-09-15 16:41:51 -07:00
Yaqiang Zhu
d11e0a214e
Add use_unix_socket_path to supervisor-proc-exit-listener (#16548)
Why I did it
ConfigDBConnector in supervisor-proc-exit-listener uses default parameter to connect CONFIG_DB (connect by 127.0.0.1:6379) which would fail at non-host network mode container, because they are not sharing the same network and socket.

How I did it
Add a new parameter use_unix_socket_path to this script to indicate whether to use socket to connect CONFIG_DB.

How to verify it
Build image and install it, kill critical processes in container and container crushed.
2023-09-15 16:23:25 -07:00
Saikrishna Arcot
f207a9b0e0
Fix potentially not having any loopback address on lo interface (#16490)
In #15080, there was a command added to re-add 127.0.0.1/8 to the lo
interface when the networking configuration is being brought down.
However, the trigger for that command is `down`, which, looking at
ifupdown2 configuration files, runs immediately after 127.0.0.1/16 is
removed. This means there may be a period of time where there are no
loopback addresses assigned to the lo interface, and redis commands will
fail.

Fix this by changing this to pre-down, which should run well before
127.0.0.1/16 is removed, and should always leave lo with a loopback
address.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-09-14 12:55:50 -07:00
ganglv
ce7145475d
Fix grpc package for ptf container (#16536)
Why I did it
PTF container needs to use new grpcio package.

Work item tracking
Microsoft ADO (number only):
How I did it
Update versions-py2

How to verify it
Check pipeline artifact
2023-09-14 08:27:32 +08:00