sonic-buildimage

Author	SHA1	Message	Date
Ashwin Hiranniah	ada7c6a72e	Add pensando platform (#15978 ) This commit adds support for pensando asic called ELBA. ELBA is used in pci based cards and in smartswitches. #### Why I did it This commit introduces pensando platform which is based on ELBA ASIC. ##### Work item tracking - Microsoft ADO (number only): #### How I did it Created platform/pensando folder and created makefiles specific to pensando. This mainly creates pensando docker (which OEM's need to download before building an image) which has all the userspace to initialize and use the DPU (ELBA ASIC). Output of the build process creates two images which can be used from ONIE and goldfw. Recommendation is use to use ONIE. #### How to verify it Load the SONiC image via ONIE or goldfw and make sure the interfaces are UP. ##### Description for the changelog Add pensando platform support.	2023-12-04 14:41:52 -08:00
Lawrence Lee	04b30fc378	[tph]: Detect LAG flaps from APPL_DB (#16879 ) Why I did it A race condition exists while the TPH is processing a netlink message - if a second netlink message arrives during processing it will be missed since TPH is not listening for other messages. Another bug was found where TPH was unnecessarily restarting since it was checking admin status instead of operational status of portchannels. How I did it Subscribe to APPL_DB for updates on LAG operational state Track currently sniffed interfaces How to verify it Send tunnel packets with destination IP of an unresolved neighbor, verify that ping commands are run Shut down a portchannel interface, verify that sniffer does not restart Send tunnel packets, verify ping commands are still run Bring up portchannel interface, verify that sniffer restarts Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2023-11-09 16:01:59 -08:00
Hua Liu	6e3260098f	Enable ZMQ between GNMI and Orchanget (#16661 ) Enable ZMQ on gnmi and orchagent #### Why I did it Improve GNMI API performance for Dash resources #### How I did it Modify gnmi and orchagent service start script, add ZMQ parameter. #### How to verify it Pass all UT & E2E test Manually verify with create Dash resources via gnmi API.	2023-10-09 14:22:50 -07:00
Zhaohui Sun	286ec3edbf	Change orchagent pop batch size from 8192 to 1024 (#16125 ) ### Why I did it Background running lua script may cause redis-server quite busy if batch size is 8192. If handling time exceeded default 5s, the redis-server will not response to other process and will cause syncd crash. ``` Aug 9 07:46:29.512326 str-s6100-acs-5 INFO database#supervisord: redis 68:M 09 Aug 2023 07:46:29.511 # Lua slow script detected: still in execution after 5186 milliseconds. You can try killing the script using the SCRIPT KILL command. Script SHA1 is: 88270a7c5c90583e56425aca8af8a4b8c39fe757 Aug 9 07:46:29.523716 str-s6100-acs-5 ERR syncd#syncd: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE. Aug 9 07:46:29.524818 str-s6100-acs-5 INFO syncd#supervisord: syncd terminate called after throwing an instance of ' Aug 9 07:46:29.525268 str-s6100-acs-5 ERR pmon#CCmisApi: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE. Aug 9 07:46:29.526148 str-s6100-acs-5 INFO syncd#supervisord: syncd std::system_error' Aug 9 07:46:29.528308 str-s6100-acs-5 ERR pmon#psud[32]: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE. Aug 9 07:46:29.529048 str-s6100-acs-5 ERR lldp#python3: :- guard: RedisReply catches system_error: command: 2#015#012$3#015#012DEL#015#012$27#015#012LLDP_ENTRY_TABLE:Ethernet37#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error Aug 9 07:46:29.529720 str-s6100-acs-5 ERR snmp#python3: :- guard: RedisReply catches system_error: command: 2#015#012$7#015#012HGETALL#015#012$28#015#012COUNTERS:oid:0x100000000000a#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error ``` 88270a7c5c90583e56425aca8af8a4b8c39fe757 is /usr/share/swss/consumer_state_table_pops.lua ##### Work item tracking - Microsoft ADO 24741990: #### How I did it Change batch size from 8192 to1024. #### How to verify it Run all test cases in sonic-mgmt to verify the system stability. ### Tested branch (Please provide the tested image version) - [x] 20220531.36	2023-08-14 17:49:49 -07:00
nmoray	f978b2bb53	Timezone sync issue between the host and containers (#14000 ) #### Why I did it To fix the timezone sync issue between the containers and the host. If a certain timezone has been configured on the host (SONIC) then the expectation is to reflect the same across all the containers. This will fix [Issue:13046](https://github.com/sonic-net/sonic-buildimage/issues/13046). For instance, a PST timezone has been set on the host and if the user checks the link flap logs (inside the FRR), it shows the UTC timestamp. Ideally, it should be PST.	2023-06-25 16:36:09 -07:00
Hua Liu	05f1a5a31e	Add watchdog mechanism to swss service and generate alert when swss have issue. (#15429 ) Add watchdog mechanism to swss service and generate alert when swss have issue. Work item tracking Microsoft ADO (number only): 16578912 What I did Add orchagent watchdog to monitor and alert orchagent stuck issue. Why I did it Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it. How I verified it Pass all UT. Manually test process_monitoring/test_critical_process_monitoring.py can pass. Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly. Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log: Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes). Details if related Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737 UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306	2023-06-12 17:53:54 -07:00
Ye Jianquan	cec9d7b83a	Revert "Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686 )" (#15390 ) This reverts commit `44427a2f6b`. Docker image not updated during PR validation and caused PR check failures. Force merge this revert. After cache is updated after this PR is merged, issue should be fixed.	2023-06-09 09:10:35 +08:00
Hua Liu	44427a2f6b	Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686 ) This PR depends on https://github.com/sonic-net/sonic-swss/pull/2737 merge first. What I did Add orchagent watchdog to monitor and alert orchagent stuck issue. Why I did it Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it. How I verified it Pass all UT. Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly. Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log: Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes). Details if related Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737 UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306	2023-06-05 22:21:17 -07:00
Junchao-Mellanox	2126def04e	[infra] Support syslog rate limit configuration (#12490 ) - Why I did it Support syslog rate limit configuration feature - How I did it Remove unused rsyslog.conf from containers Modify docker startup script to generate rsyslog.conf from template files Add metadata/init data for syslog rate limit configuration - How to verify it Manual test New sonic-mgmt regression cases	2022-12-20 10:53:58 +02:00
Arvindsrinivasan Lakshmi Narasimhan	7db272556e	[chassis] update the asic_status.py to read from CHASSIS_FABRIC_ASIC_INFO_TABLE (#12576 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com Why I did it Fixes #12575 and #12575 How I did it In the PR sonic-net/sonic-platform-daemons#311 chassisd updates to CHASSIS_FABRIC_ASIC_INFO with the fabric asic info. Updating the asic_status.py to read from the correct table. How to verify it test on chassis Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2022-12-07 21:53:47 -08:00
Zain Budhwani	98ace33b0f	Add rsyslog plugin regex for select operation failure (#12659 ) Added events for select op, alpm parity error, moved dhcp events from host to container	2022-11-13 21:41:33 -08:00
Lawrence Lee	37ad8befc1	[tunnel_pkt_handler]: Skip nonexistent intfs (#12424 ) - Skip the interface status check if the interface does not exist. In the future, when the interface is created/comes up this check will be triggered again. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-10-20 09:29:57 -07:00
Lawrence Lee	4a996f3662	[swss]: Run tunnel_pkt_handler on dualtor only (#11627 ) At SWSS docker init time, check the device subtype and enable tunnel packet handler only if it is dualtor Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-08-09 16:19:59 -07:00
abdosi	a380105461	Enable ARP Update Script for Packet based chassis. (#11465 ) What I did: Following changes done for packet based chassis:- 1> Run arp_update on LC's to resolve static route nexthops over backend port-channel interfaces. 2> On Supervisor make sure arp_update exit gracefully	2022-07-26 16:50:16 -07:00
Hua Liu	a9b7a1facd	Replace swsssdk with swsscommon (#11215 ) #### Why I did it Update scripts in sonic-buildimage from py-swsssdk to swsscommon #### How I did it Change code to use swsscommon. #### How to verify it Pass all E2E test case #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, not features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 #### Description for the changelog Update scripts in sonic-buildimage from py-swsssdk to swsscommon #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)	2022-07-11 10:01:10 +08:00
yozhao101	1720fa21d9	[tunnel_packet_handler] Add a whitespace in the warning syslog message. (#11232 ) *This PR aims to add a whitespace in the warning syslog message of process tunnel_packet_handler. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2022-06-28 17:29:02 -07:00
Yakiv Huryk	0ced7081c7	[asan] add print_suppressions=0 to ASAN configs (#11252 ) - Why I did it To provide an ability to suppress ASAN false positives and have a clean ASAN report for docker-sonic-vs/mlnx-syncd/orchagent docker - How I did it Added the "print_suppressions=0" to ASAN configs. - How to verify it add a suppression to some ASAN-enabled component (the suppression should catch some leak) build with ENABLE_ASAN=y run a test and see that the ASAN report is empty instead of having the suppression summary Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>	2022-06-28 18:45:52 +03:00
Lawrence Lee	0eeb249fd8	[swss]: Convert swss docker to bullseye (#10484 ) * [swss]: Convert swss docker to bullseye Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-05-17 13:55:59 -07:00
Kalimuthu-Velappan	bc30528341	Parallel building of sonic dockers using native dockerd(dood). (#10352 ) Currently, the build dockers are created as a user dockers(docker-base-stretch-<user>, etc) that are specific to each user. But the sonic dockers (docker-database, docker-swss, etc) are created with a fixed docker name and common to all the users. docker-database:latest docker-swss:latest When multiple builds are triggered on the same build server that creates parallel building issue because all the build jobs are trying to create the same docker with latest tag. This happens only when sonic dockers are built using native host dockerd for sonic docker image creation. This patch creates all sonic dockers as user sonic dockers and then, while saving and loading the user sonic dockers, it rename the user sonic dockers into correct sonic dockers with tag as latest. docker-database:latest <== SAVE/LOAD ==> docker-database-<user>:tag The user sonic docker names are derived from 'DOCKER_USERNAME and DOCKER_USERTAG' make env variable and using Jinja template, it replaces the FROM docker name with correct user sonic docker name for loading and saving the docker image.	2022-04-28 08:39:37 +08:00
Maxime Lorrillere	0606add017	[chassis] Get asic PCI ID from CHASSIS_STATE_DB and update asic_id in CONFIG_DB (#9681 ) Asic PCI ID (PCI address) is collected by chassisd (inside pmon - Azure/sonic-platform-daemons#175) and saved in CHASSIS_STATE_DB (in redis_chassis). CHASSIS_STATE_DB is accessible by swss containers. At docker-init.sh (script is called after swss container is created and before anything that could run in swss like orchagent...), we wait until asic PCI ID of the corresponding asic is populated by chassisd. We then update asic_id in CONFIG_DB of asic's database. A system supporting dynamic asic PCI ID identification requires to have a file (empty) use_pci_id_chassis in its platform dir. When orchagent runs, it has correct asic PCI ID in its CONFIG_DB. Together with this PR: Azure/sonic-platform-daemons#175 Azure/sonic-platform-common#185 Signed-off-by: Maxime Lorrillere <mlorrillere@arista.com> Co-authored-by: Maxime Lorrillere <mlorrillere@arista.com>	2022-04-25 13:09:42 -07:00
kellyyeh	330d11a128	Add EPMS and MgmtTsToR (#10478 )	2022-04-07 21:49:42 -07:00
Stepan Blyshchak	4426f7715f	[scapy] update scapy to 2.4.5 and patch it (#10457 ) Why I did it Running warm-reboot in a loop for 500 times leads to this error on 318-th iteration: Apr 2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors Traceback (most recent call last): Apr 2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors File "/usr/bin/restore_neighbors.py", line 24, in <module> Apr 2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors from scapy.all import conf, in6_getnsma, inet_pton, inet_ntop, in6_getnsmac, get_if_hwaddr, Ether, ARP, IPv6, ICMPv6ND_NS, ICMPv6NDOptSrcLLAddr Apr 2 15:56:27.346795 sonic INFO swss#/supervisord: restore_neighbors File "/usr/local/lib/python3.7/dist-packages/scapy/all.py", line 25, in <module> Apr 2 15:56:27.346956 sonic INFO swss#/supervisord: restore_neighbors from scapy.route import * Apr 2 15:56:27.346995 sonic INFO swss#/supervisord: restore_neighbors File "/usr/local/lib/python3.7/dist-packages/scapy/route.py", line 205, in <module> Apr 2 15:56:27.347089 sonic INFO swss#/supervisord: restore_neighbors conf.iface = get_working_if() Apr 2 15:56:27.347129 sonic INFO swss#/supervisord: restore_neighbors File "/usr/local/lib/python3.7/dist-packages/scapy/arch/linux.py", line 128, in get_working_if Apr 2 15:56:27.347213 sonic INFO swss#/supervisord: restore_neighbors ifflags = struct.unpack("16xH14x", get_if(i, SIOCGIFFLAGS))[0] Apr 2 15:56:27.347250 sonic INFO swss#/supervisord: restore_neighbors File "/usr/local/lib/python3.7/dist-packages/scapy/arch/common.py", line 31, in get_if Apr 2 15:56:27.347345 sonic INFO swss#/supervisord: restore_neighbors return ioctl(sck, cmd, struct.pack("16s16x", iff.encode("utf8"))) Apr 2 15:56:27.347365 sonic INFO swss#/supervisord: restore_neighbors OSError: [Errno 19] No such device The issue was reported to scapy devs secdev/scapy#3369, the fix is secdev/scapy#3371, however there is no released scapy version with this fix right now, thus decided to build scapy v2.4.5 from sources and apply the fix in a form of a patch. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-04-07 14:23:35 +03:00
Junchao-Mellanox	106fac5f09	[counter] Fix issue: non default counters will be delayed forever after fastboot (#10413 ) - Why I did it Fastboot will delay all counters in CONFIG DB, it relies on enable_counters.py to recover the delayed counters. However, enable_counters.py does not recover those non-default counters. - How I did it For non-default counters, if it is in CONFIG DB, put delay status to false after the waiting. - How to verify it Manual test	2022-03-31 15:23:57 +03:00
Lawrence Lee	b31df59c7c	[tun_pkt]: Wait for AsyncSniffer to init fully (#10346 ) Fix for Tunnel packet handler can crash at system startup Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-03-30 14:03:29 -07:00
judyjoseph	8e642848c2	Introduce the asic_subtype field for adding the sub platform variants. (#10235 ) * Introduce the asic_subtype field for adding the sub platform variants. It uses the value of TARGET_MACHINE variable in slave.mk.	2022-03-28 11:22:32 -07:00
Saikrishna Arcot	5617b1ae3e	Image disk space reduction (#10172 ) # Why I did it Reduce the disk space taken up during bootup and runtime. # How I did it 1. Remove python package cache from the base image and from the containers. 2. During bootup, if logs are to be stored in memory, then don't create the `var-log.ext4` file just to delete it later during bootup. 3. For the partition containing `/host`, don't reserve any blocks for just the root user. This just makes sure all disk space is available for all users, if needed during upgrades (for example). * Remove pip2 and pip3 caches from some containers Only containers which appeared to have a significant pip cache size are included here. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Don't create var-log.ext4 if we're storing logs in memory Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Run tune2fs on the device containing /host to not reserve any blocks for just the root user Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-03-15 18:12:49 -07:00
Lawrence Lee	4d2a55d373	[swss]: Wait for vlan intf to start ndppd (#10119 ) - Use the `wait_for_link.sh` script to delay ndppd start until after the VLAN interface is ready - Avoids issue where ndppd tries to change interface attributes before the interface is ready	2022-03-02 16:23:56 -08:00
Lawrence Lee	47d9b26063	Revert "[swss]: Wait for vlan intf to start ndppd (#10036 )" (#10085 ) This reverts commit `91204879df`. #10036 breaks ndppd functionality	2022-02-28 15:42:02 -08:00
Lawrence Lee	91204879df	[swss]: Wait for vlan intf to start ndppd (#10036 ) - Use the `wait_for_link.sh` script to delay ndppd start until after the VLAN interface is ready - Avoids issue where ndppd tries to change interface attributes before the interface is ready Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-02-24 17:54:45 -08:00
Oleksandr Ivantsiv	25a0ce5eb1	[asan] Add address sanitizer support. (#9857 ) Implement infrastructure that allows enabling address sanitizer for docker containers. Enable address sanitizer for SWSS container. - Why I did it To add a possibility to compile SONiC applications with address sanitizer (ASAN). ASAN is a memory error detector for C/C++. It finds: 1. Use after free (dangling pointer dereference) 2. Heap buffer overflow 3. Stack buffer overflow 4. Global buffer overflow 5. Use after return 6. Use after the scope 7. Initialization order bugs 8. Memory leaks - How I did it By adding new ENABLE_ASAN configuration option. - How to verify it By default ASAN is disabled and the SONiC image is not affected. When ASAN is enabled it inspects all allocation, deallocation, and memory usage that the application does in run time. To verify whether the application has memory errors tests that trigger memory usage of the application should be run. Ideally, the whole regression tests should be run. Memory leaks reports will be placed in /var/log/asan/ directory of SONiC host OS. Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>	2022-02-09 13:29:18 +02:00
Lawrence Lee	eff80f750f	[swss]: Reduce tunnel_packet_handler memory usage (#9762 ) * Configure scapy to not store sniffed packets Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-02-07 11:55:48 -08:00
Andriy Yurkiv	cb3b9416a6	[Mellanox][VXLAN] add params to vxlan.json file in order to configure VXLAN src port range feature (#9658 ) - Why I did it Remove obsolete parameter that enables static VXLAN src port range provide functionality no generate json config file according to appropriate parameter in config_db Done for SN3800: • Mellanox-SN3800-D28C50 • Mellanox-SN3800-C64 • Mellanox-SN3800-D28C49S1 (New 10G SKU) SN2700: • Mellanox-SN2700-D48C8 - How I did it Remove SAI_VXLAN_SRCPORT_RANGE_ENABLE=1 from appropriate sai.profile files Created vxlan.json file and added few params that depends on DEVICE_METADATA.localhost.vxlan_port_range - How to verify it File /etc/swss/config.d/vxlan.json should be generated inside swss docker when it restart [ { "SWITCH_TABLE:switch": { "vxlan_src": "0xFF00", "vxlan_mask": "8" }, "OP": "SET" } ] Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>	2022-01-31 15:57:30 +02:00
abdosi	6c507329b7	Enable/Disable Order ECMP feature. (#9651 ) Updated Jinja2 Template in switch.json.j2 for enabling/disabling Order ECMP feature based on device role. Changes as per design: Azure/SONiC#896	2022-01-06 16:40:50 -08:00
Saikrishna Arcot	bd479cad29	Create a docker-swss-layer that holds the swss package. This is to save about 50MB of disk space, since 6 containers individually install this package. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-01-06 09:26:55 -08:00
zzhiyuan	a6d0a27a18	[Arista] Increase switch PCIe timeout for 7060-cx32s (#9248 ) Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com> Why I did it Arista 7060 platform has a rare and unreproduceable PCIe timeout that could possibly be solved with increasing the switch PCIe timeout value. To do this we'll call a script for this platform to increase the PCIe timeout on boot-up. No issues would be expected from the setpci command. From the PCIe spec: "Software is permitted to change the value in this field at any time. For Requests already pending when the Completion Timeout Value is changed, hardware is permitted to use either the new or the old value for the outstanding Requests, and is permitted to base the start time for each Request either on when this value was changed or on when each request was issued. " How I did it Add "platform-init" support in swss docker similar to how "hwsku-init" is called, only this would be for any device belonging to a platform. Then the script would reside in device data folder. Additionally, add pciutils dependency to docker-orchagent so it can run the setpci commands. How to verify it On bootup of an Arista 7060, can execute: lspci -vv -s 01:00.0 \| grep -i "devctl2" In order to check that the timeout has changed.	2021-12-17 08:43:25 -08:00
Lawrence Lee	7bd0a2ad11	[swss]: Listen for undeliverable tunnel packets (#9348 ) - Create a script in the orchagent docker container which listens for these encapsulated packets which are trapped to CPU (indicating that they cannot be routed/no neighbor info exists for the inner packet). When such a packet is received, the script will issue a ping command to the packet's inner destination IP to start the neighbor learning process. - This script is also resilient to portchannel status changes (i.e. interface going up or down). An interface going down does not affect traffic sniffing on interfaces which are still up. When an interface comes back up, we restart the sniffer to start capturing traffic on that interface again.	2021-12-14 14:45:23 -08:00
Junchao-Mellanox	554b04f312	Add trap flow counter support (#8940 ) *Add trap flow counter support	2021-11-24 15:26:52 -08:00
Stephen Sun	b3ccef9c08	[Reclaim buffer] Common infrastructure update for reclaiming buffer (#9133 ) - Why I did it This is to update the common sonic-buildimage infra for reclaiming buffer. - How I did it Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there. Rendering is done here for passing azure pipeline. Load zero_profiles.json when the dynamic buffer manager starts Generate inactive port list to reclaim buffer Signed-off-by: Stephen Sun <stephens@nvidia.com>	2021-11-24 15:00:23 +02:00
Stepan Blyshchak	a2c2d67098	[ACL] enable ACL FC when genereting config from minigraph but disable by default (#8908 ) * [ACL] enable ACL FC when genereting config from minigraph but disable by default Why I did it To support ACL counters on Flex Counter Infrastructure. How I did it Enable ACL FC in init_cfg and minigraph. Disable when genereting configuration from preset. How to verify it Together with depends PRs. Run ACL/Everflow test suite. Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2021-11-11 09:07:54 +08:00
tjchadaga	8544147a70	Fix for additional intf flap during fast-reboot (#9166 )	2021-11-08 15:21:11 -08:00
Lawrence Lee	7c0507b6db	[swss]: Start ndppd after vlanmgrd (#9155 ) Why I did it During swss container startup, if ndppd starts up before/with vlanmgrd, ndppd will be pinned at nearly 100% CPU usage. How I did it Only start ndppd after vlanmgrd is running. Also, call ndppd directly instead of through bash for improved logging and to prevent orphaned processes. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-03 11:03:01 -07:00
Sudharsan Dhamal Gopalarathnam	fcff3f3d09	VxLAN Tunnel Counters and Rates implementation (#8369 ) * Enable flex counters for Vxlan tunnel	2021-11-01 10:42:21 -07:00
shlomibitton	112fda7877	[Flex Counters] Reset flex counters delay flag on config DB when enable_counters script is called (#8500 ) #### Why I did it Reset flex counters delay flag on config DB when enable_counters script is called to allow enablement of flex counters in orchagent. #### How I did it Push to config DB 'false' value for delay indication when enable_counters script is called before enabling the counters. #### How to verify it Observe counters are created when enable_counters script is called.	2021-09-01 21:17:36 -07:00
Blueve	aa01315f60	[ARM] Fix issue whre the ping6 tool is missing from orchagent docker (#8345 ) Signed-off-by: Jing Kan jika@microsoft.com	2021-08-05 22:00:50 +08:00
ngoc-do	710563f83d	[fabric] Disable unnecessary processes in swss and the orchagent-portsyncd dependency for fabric asic (#5569 ) * Disable unnecessary processes in swss for fabric asic Signed-off-by: ngocdo <ngocdo@arista.com>	2021-06-09 10:53:47 -07:00
Andriy Yurkiv	0c2521b936	Set default values only on the first start (#7735 )	2021-06-09 18:39:22 +08:00
yozhao101	1a3cab43ac	[Monit] Deprecate the feature of monitoring the critical processes by Monit (#7676 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit. How I did it I removed the script process_checker and corresponding Monit configuration entries of critical processes. How to verify it I verified this on the device str-7260cx3-acs-1.	2021-06-04 10:16:53 -07:00
Lawrence Lee	1b39424520	[docker-orchagent]: Increase ndppd kernel poll interval (#7456 ) Why I did it ndppd by default reads /proc/net/ipv6_route ever 30 seconds. Since T1s advertise so many routes to ToRs, this file is extremely large, and reading it causes ndppd's CPU usage to spike every 30 seconds How I did it Increase the delay for reading this file to the maximum possible value (max integer value), which will result in CPU spikes every ~24 days instead of every 30 seconds How to verify it Start ndppd with the new config file, confirm that no CPU spikes are seen except at startup Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-04-30 16:30:30 -07:00
Prince Sunny	20c8dd2691	[IPinIP] Add Loopback2 interface, change dscp mode to uniform (#7234 ) Co-authored-by: Ubuntu <prsunny>	2021-04-07 09:58:12 -07:00
Stephen Sun	0b16ca4ae9	[monit] Avoid monit error log by removing "-l" from monit_swss\|buffermgrd (#7236 ) Avoid the following error messages while dynamic buffer calculation is enabled ``` ERR monit[491]: 'swss\|buffermgrd' status failed (1) -- '/usr/bin/buffermgrd -l' is not running in host ``` Change /usr/bin/buffermgrd -l to /usr/bin/buffermgrd. The buffermgrd is started by -l for traditional model or -a for dynamic model. So we need to use the common section of both. Signed-off-by: Stephen Sun <stephens@nvidia.com>	2021-04-06 10:12:23 -07:00

1 2 3 4 5

247 Commits