sonic-buildimage

Author	SHA1	Message	Date
Shilong Liu	ab91b607f2	[ci] Fix remove sonic-slave-* docker image issue when building sonic-slave* (#10296 ) (#10381 )	2022-03-30 16:08:52 +08:00
xumia	ca85c6be5e	[Unit Test]: Fix sonic config engine test not stable issue(#10147 ) (#10355 ) Why I did it Fix sonic config engine test not stable issue	2022-03-29 10:41:11 +08:00
Alexander Allen	b10833425c	[202106] [Mellanox] Fix DPB supported breakout modes (#10129 ) Cherry pick of #10072 - Why I did it Removing DPB breakout modes that require adjacent ports to be disabled as that is not supported by the current DPB infrastructure. Correspondingly had to remove the hwsku.json file from any SKUs which utilized these removed modes such that the system will fall back to ports_config.ini and DPB will not be supported for those SKUs. - How I did it Modified the platform.json files of Mellanox devices. - How to verify it Execute show int break [Ethernet] on the affected platforms and ensure there are no modes present that would require an adjacent port to be disabled to function.	2022-03-27 09:15:23 +03:00
Oleksandr Ivantsiv	40265e8dde	[sonic-config-engine]: Improve comparison between default and supported breakout modes. (#9278 ) Closes #7958 #### Why I did it The previous implementation of sonic-cfggen did a simple comparison between default breakout mode in hwsku.json and supported modes in platform.json. To set a different default speed in hwsku.json it was required to add one more entry to supported modes in platfrom.json file: 1x10G[100G,50G] vs 1x100G[50G,10G] The new implementation does more intelligent parsing and analysis of supported and default modes. It allows changing default speed without adding a new entry to platform.json. #### How I did it Add more intelligent parsing and analysis of supported and default modes. #### How to verify it Run sonic-config-engine unit tests from sonic-config-engine/tests directory	2022-03-24 05:15:11 +00:00
Alexander Allen	b5849c0918	[pmon] Clean up supervisord chassis_db_init entry and fix startsecs (#10071 ) Why I did it Code review was still in progress when #9858 was merged and upon further testing I have arrived at a better solution. How I did it Modified supervisord configuration j2 template for pmon to require no minimum uptime for chassisd_db_init and to remove the redundant exit_codes directive How to verify it Boot switch and verify in syslog that there are no errors related to chassis_db_init	2022-03-24 05:14:38 +00:00
Sudharsan Dhamal Gopalarathnam	4da20e7ff9	[containerd]Fixing container commands when mode is local and state is disabled (#9986 ) Why I did it During warm-reboot and fast-reboot the below error logs appear Feb 3 22:05:15.187408 r-lionfish-13 ERR container: docker cmd: kill for nat failed with 404 Client Error for http+docker://localhost/v1.41/containers/nat/json: Not Found ("No such container: nat") The container command when called for local mode doesn't check if it is enabled before calling docker kill which throws the above errors. `b6ca76b482/scripts/fast-reboot (L699)` How I did it Checking feature state if local mode and returning error exit code along with valid debug message. How to verify it Manually tested with warm-reboot and fast-reboot Added UT to verify it.	2022-03-24 05:14:01 +00:00
xumia	43c32dfa5d	[Build]: Use one debian mirror config (#10274 ) (#10300 ) Why I did it Use one debian mirror config. The empty config in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/apt/sources.list overrides the file https://github.com/Azure/sonic-buildimage/blob/master/files/apt/sources.list.amd64 (armhf/arm64), it does not make sense. All the content in files/image_config/apt is no use, any one wants to add mirror config, please add in files/apt. How I did it Remove files/image_config/apt and the reference.	2022-03-23 22:42:52 +08:00
xumia	37aaed44bd	[Build]: Support to set jobFilters (#10280 ) [Build]: Support to set jobFilters	2022-03-21 02:28:54 +00:00
Samuel Angebault	01fdf5954b	[202106][Arista] Update driver submodules (#10279 ) - add mechanism to power off linecard and fabrics on supervisor reboot (only lc by default) - improve lc interface config script - fix exception handling in logging	2022-03-20 17:32:09 -07:00
Shilong Liu	c14462c6a3	Add a config variable to override default container registry instead of dockerhub. (#10166 ) (#10261 ) * Add variable to reset default docker registry * fix bug in docker version control	2022-03-19 00:11:09 +08:00
Guohan Lu	fd8d80486d	[ci]: build marvel armhf on sonicbld-armhf pool [Guohan Lu] Signed-off-by: Guohan Lu <lguohan@gmail.com>	2022-03-18 14:55:15 +00:00
Guohan Lu	2340071d4a	[ci]: build centec arm64 to sonicbld-arm64 pool Signed-off-by: Guohan Lu <lguohan@gmail.com>	2022-03-15 12:35:50 +00:00
xumia	d75c569ef0	[Build][Ci]: Support to use the cisco sai packages built by azp (#10102 ) Why I did it Support to use the cisco sai packages built by azp	2022-03-08 02:18:58 +00:00
Marty Y. Lok	f94ce408c6	[chassis][supervisor]monit container-checker failed due to unexpected "database-chassis" docker running #9042 (#9043 ) Why I did it Fixed the monit container_checker fails due to unexpected "database-chassis" docker running on Supervisor card in the VOQ chassis. fixes #9042 How I did it Added database-chassis to the always running docker list if platform is supervisor card. How to verify it Execute the CLI command "sudo monit status container_checker" Signed-off-by: mlok <marty.lok@nokia.com>	2022-03-04 21:49:52 +00:00
wenyiz2021	05e566ed45	Update container_checker for multi-asic devices when state is 'always_enabled' (#10067 ) * Update container_checker for multi-asic devices Update container_checker for multi-asic devices to add database containers in always_running_containers. Previous change was made for single-asic, and that database containers were not considered as feature when writing to state_db. * Update container_checker Update an indent	2022-03-04 21:48:53 +00:00
Samuel Angebault	5419e5de71	Add platform.json configs for all denali SKUs (#9717 )	2022-03-01 20:17:56 +00:00
Junchao-Mellanox	bc9b39a2fe	Stop PMON before swss during warm reboot (#10046 ) - Why I did it Stopping swss and syncd causes some driver module unloading. Those driver modules are depended by PMON. This could trigger ERROR logs in syslog. - How I did it Adjust warmboot shutdown order in make file - How to verify it Manual test	2022-03-01 20:13:09 +00:00
Junchao-Mellanox	74c49a7682	[system-health] Fix file handle leak (#10059 ) - Why I did it swsscommon.ConfigDBConnector does not automatically close connection when the instance is recycled by python. So, it should not create this instance each time calling check_services. It will cause error like Failed to read from file /var/run/hw-management/led/led_status_capability - OSError(24, 'Too many open files') - How I did it Only connect DB once in init - How to verify it Manual test	2022-03-01 20:12:03 +00:00
Alexander Allen	212cdfbe80	[pmon] Fix chassis_db_init exit not being expected (#9858 ) - Why I did it Error log was shown on switches during boot pmon#supervisord 2021-12-22 04:27:16,709 INFO exited: chassis_db_init (exit status 0; not expected) - How I did it Add exit code zero as an expected exit code and also disable autorestart. - How to verify it Boot the switch and ensure the above log line does not appear.	2022-03-01 03:49:55 +00:00
Alexander Allen	127a93c201	[Mellanox] Add 2x40G support to MSN4700 platform (#9485 ) - Why I did it MSN4700 platform has 8 lanes per port and thus can support 2x40G with each lane running at 10G - How I did it Added 40G to 2x200G breakout mode in platform.json - How to verify it Run config int break Ethernet0 2x40G[200G,100G,50G,25G,10G,1G] And verify the command runs successfully and the port speed was set to 40G with a 2x breakout.	2022-03-01 03:48:25 +00:00
Stephen Sun	84942c10d6	Fix typo and missing files in SN3800 and SN4600C's buffer templates (#9537 ) Why I did it Fix typo and missing files in SN3800 and SN4600C's buffer templates How I did it ingress_lossless_xoff_size => ingress_lossless_pool_xoff add missing files for SN4600C-D100C12S2 How to verify it Deploy the fix and verify whether the device can be up. Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-03-01 03:42:27 +00:00
noaOrMlnx	be7f31e6d2	[CoPP] Add always_enabled field (#9302 ) *Add the "always_enabled" field to copp_cfg.j2 file, in order to allow traps without an entry in features table, to be installed automatically.	2022-03-01 03:40:40 +00:00
Mahesh Maddikayala	87d18eb6c3	[libsaibcm] Fix the expiry date on the libsaibcm debian packages. (#10090 )	2022-02-28 08:52:11 -08:00
Shilong Liu	d05765cc06	[build] Increase vs platform kvm disk size (#10001 ) Info: Attempting file://dev/vdb/onie-installer ... Info: Attempting file://dev/vdb/onie-installer.bin ... cp: write error: No space left on device Failure: local_fs_run():/dev/vdb Unable to copy /tmp/tmp.CPY0ad/onie-installer.bin to tmpfs vs image is failing. Increase kvm device space.	2022-02-25 10:26:46 -08:00
xumia	a3733384bf	[Security]: Upgrade urllib3 to fix CVE-2021-33503 See https://security.archlinux.org/CVE-2021-33503	2022-02-25 09:13:48 +00:00
Mahesh Maddikayala	2153ea41a5	[libsaibcm] Update libsaibcm with PFC patches (#10066 )	2022-02-23 14:24:10 -08:00
xumia	93a22d7df5	[Build]: Fix marvell sai package version parsing issue Fix marvell sai package version parsing issue (#10009)	2022-02-19 04:23:29 +00:00
Nazarii Hnydyn	293ca0b870	[build]: Fix SAE to support debug flavor build. (#9435 ) setup PACKAGE_NAME, PATH, MACHINE, VERSION info for debug docker Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>	2022-02-09 14:20:22 +08:00
Saikrishna Arcot	aa69f6b94c	[docker-mgmt-framework]: Don't overwrite /etc/passwd and /etc/group with symlinks (#9375 ) Fixes #9376 Because /etc/passwd and /etc/group have been overwritten with symlinks to /host_etc/passwd and /host_etc/group, the debug container build fails. This is because the debug container is built without /etc being mounted at /host_etc in the container (which does happen at runtime). Because of that, /etc/passwd and /etc/group don't exist, which causes some package installation errors when openssh-client tries to create a group. This is a partial revert of `1347f29178`. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-02-08 13:14:41 +08:00
Shilong Liu	0e6eb338eb	Fix rules/functions.generage_manifest. (#9340 ) Why I did it Fix a bug in sonic debug image build. That bug is imported in the following PR: #8920	2022-02-07 14:09:58 +08:00
Volodymyr Samotiy	97bd2bf82f	[Mellanox][202106] Update SAI to 1.20.2 and SDK/FW to 4.5.1208/2010.1218 (#9618 ) - Why I did it To include latest fixes. 1. On CMIS modules, after low power configuration, the firmware waited for the module state to be ModuleReady instead of ModuleLowPower causing delays. 2. When connecting SN4600C, 100GbE port with CWDM4 module (Gen 3.0), link up time is 30 seconds. - How I did it Updated SDK/SAI submodule and relevant makefiles with the required versions. - How to verify it Build an image and run tests from "sonic-mgmt". Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>	2022-01-26 10:59:39 +02:00
Shilong Liu	f5bef56d5e	[CI] Fix Azure pipeline set -e not work. (#9282 ) In azure pipeline template 'set -e' not works as expected.	2022-01-24 15:26:06 +08:00
Junchao-Mellanox	c0f0694236	[Mellanox] [202106] Optimize thermal policies (#9451 ) - Why I did it Optimize thermal control policies to simplify the logic and add more protection code in policies to make sure it works even if kernel algorithm does not work. - How I did it Reduce unused thermal policies Add timely ASIC temperature check in thermal policy to make sure ASIC temperature and fan speed is coordinated Minimum allowed fan speed now is calculated by max of the expected fan speed among all policies Move some logic from fan.py to thermal.py to make it more readable - How to verify it 1. Manual test 2. Regression	2022-01-19 11:43:55 +02:00
arlakshm	8ef7030f4f	remove staticd.conf.j2 (#9182 ) Why I did it resolves #8979 and #9055 How I did it Remove the file static.conf.j2,which adds the default route on eth0 from bgp docker Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2022-01-19 02:09:31 +00:00
Stephen Sun	c0df856847	Update sonic-swss-common (#9787 ) d00a25bb [ci] refer 202106 branch resources rather than master branch. (#573) Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-01-18 22:11:20 +02:00
Stepan Blyshchak	559ce14277	[nvidia] fail the build when hw-mgmt patches do not apply (#9565 ) - Why I did it To fix an issue that hw-mgmt patches were not applied. One patch was already in upstream hw-mgmt package thus applying it again caused an error and no other patches were applied. Also, I did it to improve the Makefile, so that the make will fail in case patches fail to apply. - How I did it Removed obsolete patch, made applying patches a hard failure in the build. - How to verify it Run the make and verify patches are applied. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-01-16 08:10:25 +02:00
Qi Luo	c4bc9933fe	Update sonic-swss submodule (#9609 ) Includes below commits ``` 38b616ba 2021-12-09 \| [LGTM] lgtm to use 202106 branches of swss-common and sairedis (#2074) [Stephen Sun] c68bad07 2021-12-07 \| Fix random failure in PR/CI build. #2006 [Shilong Liu] ba17675b 2021-08-23 \| [ci]: fix artifacts download from swss-common and sairedis (#1882) [Guohan Lu] ```	2021-12-30 17:23:38 -08:00
Judy Joseph	1aa225cd0c	Update sonic-utilities submodule 74d2a09 [portstat] check TX/RX utilization calculation correctness (#1840)	2021-12-22 09:09:09 -08:00
Stephen Sun	b479bcd941	[Mellanox] Adjust buffer parameters with 2km cable supported for 4600C non-generic SKUs (#9215 ) - Why I did it Also recalculated all parameters with the latest algorithm with per-speed peer response time taken into account - How I did it Detailed information of each SKU: C64: t0: 32 100G downlinks and 32 100G uplinks t1: 56 100G downlinks and 8 100G uplinks with 2km-cable supported D112C8: 112 50G downlinks and 8 100G uplinks. D48C40: 48 50G downlinks, 32 100G downlinks, and 8 100G uplinks D100C12S2: 4 100G downlinks, 2 10G downlinks, 100 50G downlinks, and 8 100G uplinks 2km cable is supported for C64 on t1 only - How to verify it Run regression test (QoS) Signed-off-by: Stephen Sun <stephens@nvidia.com>	2021-12-22 09:06:33 -08:00
Stepan Blyshchak	4ebafdaf28	[Mellanox][SDK] Build SDK with PRM sniffer support (#9500 ) - Why I did it To have an ability to use PRM sniffer. - How I did it Enabled the option in configure flags. - How to verify it Built and ran on switch. Enabled the feature in runtime and checked the sniffer recording. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2021-12-22 09:05:49 -08:00
Vadym Hlushko	79d2a9a225	[Mellanox] [SN4410] [202106] Fixed capability files - port_config.ini, hwsku.json, platform.json (#9541 ) - Why I did it The capability files were incorrect in comparison to the marketing spec of the SN4410 platform. - How I did it Aligned the capability files according to the marketing spec. - How to verify it Did basic manual sanity checks: 1. Check if critical docker containers were UP 2. Check if interfaces were created and were UP 3. Check if interfaces created in the syncd docker container by executing – sx_api_ports_dump.py script 4. Check the logs from the start of the switch – everything was OK 5.Verified the port breakout Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>	2021-12-16 17:39:17 +02:00
Junchao-Mellanox	73a17d8876	[Mellanox] [202106] Fix issue: SFP might not be re-initialized after cable plug in (#9387 ) - Why I did it Fix issue: SFP might not be re-initialized after cable plug in. There could be case like: 1. The SFP object was initialized as QSFP type 2. User use adapter to connect a SFP to a QSFP port 3. The SFP object treat the SFP as QSFP and failed to process EEPROM - How I did it If a new SFP is plugged, always re-initialize the SFP - How to verify it Added unit test case	2021-12-14 16:27:26 +02:00
Junchao-Mellanox	3ae3eeda35	[Mellanox] Allow user to set LED to orange (#9259 ) (#9515 ) Backport #9259 to 202106 - Why I did it Nvidia platform API does not support set LED to orange. - How I did it Allow user to set LED to orange - How to verify it Manual test and unit testing	2021-12-14 13:33:23 +02:00
Stephen Sun	5c91f233ef	[Reclaim buffer][202106] Reclaim unused buffers by applying zero buffer profiles (#9062 ) This is to backport community PR #8768 to 202106 branch Why I did it Support zero buffer profiles Add buffer profiles and pool definition for zero buffer profiles Support applying zero profiles on INACTIVE PORTS Enable dynamic buffer manager to load zero pools and profiles from a JSON file Signed-off-by: Stephen Sun stephens@nvidia.com How I did it Add buffer profiles and pool definition for zero buffer profiles If the buffer model is static: - Apply normal buffer profiles to admin-up ports - Apply zero buffer profiles to admin-down ports If the buffer model is dynamic: - Apply normal buffer profiles to all ports - buffer manager will take care when a port is shut down - Update buffers_config.j2 to support INACTIVE PORTS by extending the existing macros to generate the various buffer objects, including PGs, queues, ingress/egress profile lists Originally, all the macros to generate the above buffer objects took active ports only as an argument Now that buffer items need to be generated on inactive ports as well, an extra argument representing the inactive ports need to be added To be backward compatible, a new series of macros are introduced to take both active and inactive ports as arguments The original version (with active ports only) will be checked first. If it is not defined, then the extended version will be called Only vendors who support zero profiles need to change their buffer templates Enable buffer manager to load zero pools and profiles from a JSON file: The JSON file is provided on a per-platform basis It is copied from platform/<vendor> folder to /usr/share/sonic/temlates folder in compiling time and rendered when the swss container is being created. To make code clean and reduce redundant code, extract common macros from buffer_defaults_t{0,1}.j2 of all SKUs to two common files: One in Mellanox-SN2700-D48C8 for single ingress pool mode The other in ACS-MSN2700 for double ingress pool mode Those files of all other SKUs will be symbol link to the above files Update sonic-cfggen test accordingly: - Adjust example output file of JSON template for unit test - Add unit test in for Mellanox's new buffer templates. How to verify it Regression test. Unit test in sonic-cfggen Run regression test and manually test.	2021-12-13 10:51:50 -08:00
Samuel Angebault	08c2c07fc0	[202106][Arista] Update arista platform library (#9483 )	2021-12-09 18:29:59 -08:00
Arvindsrinivasan Lakshmi Narasimhan	e2b8e2d1da	submodule update swss Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2021-12-08 00:25:09 +00:00
Vivek Reddy	4856f98716	[Mellanox] [SKU] Fix the shared headroom for 4600C-C64 SKU (#8242 ) Removed ingress_lossy_pool from the BUFFER_POOL list Fx the the egress_lossless_pool_size value Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>	2021-12-08 00:23:07 +00:00
Volodymyr Samotiy	b22db5a52b	[Mellanox] [202106] Update SAI to v1.20.0.1 and SDK/FW to v4.5.1156/v2010.1152 (#9431 ) - Why I did it To include latest fixes. SAI * Reduce verbosity of warning message on shared memory already existing * accuflow allocation support by key value SDK * Under various circumstances, Ethernet ports falsely showed that InfiniBand cables were connected. * In SN4600C, at times, the link up time in both DAC and optics cables may, in the worst case, take up to 15 seconds. * Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times * When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped. * When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior. * Aggregation event is missing for WJH L2 drop reason 'Unicast egress port list is empty'. * Tying the SCL and SDA of the optical modules to 3.3V causes errors. * On SN4600, there was a delay of more than 10 seconds from the time a data packet is sent from CPU until it is transmitted through one of the switch ports. * While using SN4600C system with Finisar FTLC1157RGPL 100GbE CWDM4 modules, intermittent link flaps across multiple ports may be observed. * In Spectrum-2 and Spectrum-3 systems, link did not work in auto-negotiation when connected to Marvell PHY. KR mechanism has been enhanced to integrate with Marvell PHY. * The tunnel counter counts the drop packets now for Spectrum-2 and Spectrum-3 and consistent with Spectrum behavior and count the ECN dropped packets as well. * When connecting SN3800 to Cisco-9000, fast-linkup flow will fail and will rise in the normal flow. * Race condition in WJH library: when multiple threads load the LAG shared memory concurrently, the program may crash. * Add WJH L2 drop reason 'Unicast egress port list is empty' as a new drop reason. * Fixed a memory leak in sx_api_port_sflow_statistics_get API. * During initialization flow, the command interface that is used by the minimal driver and SDK caused the collision in the firmware since the same buffer is used in the firmware for the two interfaces. - How I did it Updated SDK/SAI submodule and relevant makefiles with the required versions. - How to verify it Build an image and run tests from "sonic-mgmt". Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>	2021-12-06 21:54:14 +02:00
Mahesh Maddikayala	3f3dceb96a	[broadcom]: update bcm dnx gpl module pointer (#9442 ) saibcm_modules_dnx submodule update contains a fix for kernel crash	2021-12-03 21:21:07 -08:00
Judy Joseph	70b24ad9c5	Update sonic-utilities submodule 9514857 [config reload][202106] Update command reference (#1944)	2021-12-01 19:21:45 -08:00

1 2 3 4 5 ...

5220 Commits