Why I did it
Ubuntu 22.04 uses cgroup2 by default, but docker.sh doesn't mount it.
As a result we get an error when trying to run docker info in chroot env:
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
How I did it
mount cgroup2 in chroot if all enabled kernel cgroup controllers are currently not in use by cgroup1
So we need to mount cgroup in chroot environment on /sys/fs/cgroup.
Because inside chroot we don't know which cgroup version is used by the host we have two possible solutions:
cgroup tree for chroot is mounted by the host (it was my 1st version of this fix)
cgroup tree is mounted inside chroot based on info from /proc/cgroups (it's current version of this fix)
My 2nd version based on this code from systemd: 5c6c587ce2/src/shared/cgroup-setup.c (L35-L74)
We parse info from /proc/cgroups
Skip header line started from #
Skip controller if it's disabled (4th column = 0)
Count number of controllers with non-zero of hierarchy_id (2nd column)
If this number is not zero then we assume some of controllers are used by host system and the host system uses hybrid or legacy cgroup tree. In this case we can't use unified cgroup tree inside chroot and mount old cgroup tree (v1).
If this number is zero then we assume host system uses unified cgroup tree and we need to mount cgroup2 inside chroot.
Signed-off-by: Konstantin Vasin <k.vasin@yadro.com>
Why I did it
On DNX (J2/J2c/J2c+) platforms, Single Path Nexthops and ECMp Nexthop resources(FECs) are shared. BRCM SAI do not have partition of this resource, and hence more single path Nexthop entries, causes ECMP programming to fail in scaled setup.
How I did it
Broadcom provided SAI changes to reserve resources for single path nexthop entries(More details in CSP: https://brcmsemiconductor-csm.wolkenservicedesk.com/wolken-support/allcases/request-details?requestId=CS00012251649).
Along with SAI changes, they provided configurable Macro/flag to reserve NON_ECMP entries.
This PR is to add that flag in various sai.profile files wherever applicable.
PS: We are reserving 3072 single path Nexthop entries on each Linecard. Calculation is as follows.
Max Slots per chassis: 8
Max No of Ports(each LC): 64
MyIP/Subnet Entries per port: 4(v4/v6)
Nbr Entries Per port: 2(v4/v6)
Total Non_ECMP Count: 8x64x(4+2) = 3072
How to verify it
Without this change, the ECMP group count will be shown as Max_count in 'crm show resources all' command, and with this change the ECMP group count will be decreased by 24(3072/128).
Port indexes of front panel ports are not contiguous in multi-asic because we didn't distiguish between
front panel and internal ports, e.g., recycle ports. Fix this by assigning index to front panel port first
and then internal ports.
Why I did it
Provide CPLD and FPGA driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Fix#12279
Why I did it
Curl can fail when we calculate md5sum of web package.
E.g. if server responsed with 503 error.
But we don't validate this and pass any output from curl directly to md5sum.
After that we save incorrect md5 hash to versions-web file.
How I did it
use option --retry 5 for transient errors (default value is 0)
use option -f for curl and set -o pipefail for shell to detect errors
stop build if curl failed
Signed-off-by: Konstantin Vasin <k.vasin@yadro.com>
Why I did it
To ensure, that after a BGP startup, dualtor T0 receives BGP updates before sending out BGP updates.
Please refer to sonic-net/SONiC#1161 for more details.
How I did it
add coalesce-time 10000 to the frr bgp startup config.
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
MarkDown linter "mdl" reports many warnings on README.md.
Let them get fixed to ease its maintenance and readability.
Signed-off-by: Guillaume Lambert <guillaume.lambert@orange.com>
- Running "sudo pip install" can be OK in a CI but generates a clear
warning. It is strongly disadvised to run it in a user or a production
environment because it can break the host packages system.
"pip install --user" or venv are usually prefered in most situations.
- Nested virtualization support is not always enough to build OVS image
inside a VM. The full KVM interface must be exposed, what may require
extra configuration such as the KVM paravirtualization in VirtualBox.
- On the recommended version of Ubuntu (20.0.4), installing docker CE
with the given process does not remove docker from snap
when a previous installation of the distribution package docker.io
was already present in the system.
Using docker through snap currently triggers a bug during the SONiC
build process.
https://stackoverflow.com/questions/52526219/docker-mkdir-read-only-file-system
Signed-off-by: Guillaume Lambert <guillaume.lambert@orange.com>
Why I did it
It is to fix the broadcom build failure, it is caused by the build image docker-dhcp-relay:latest not found.
2022-12-14T00:09:57.5464893Z [ FAIL LOG START ] [ target/docker-dhcp-relay.gz-load ]
2022-12-14T00:09:57.5466036Z Attempting docker image lock for docker-dhcp-relay load
2022-12-14T00:09:57.5467113Z Obtained docker image lock for docker-dhcp-relay load
2022-12-14T00:09:57.5468206Z Loading docker image target/docker-dhcp-relay.gz
2022-12-14T00:09:57.5469361Z Loaded image: docker-dhcp-relay:internal.65852159-11ad82a07a
2022-12-14T00:09:57.5470686Z Tagging docker image docker-dhcp-relay:latest as docker-dhcp-relay-sonic:latest
2022-12-14T00:09:57.5471997Z Error response from daemon: No such image: docker-dhcp-relay:latest
2022-12-14T00:09:57.5473122Z [ FAIL LOG END ] [ target/docker-dhcp-relay.gz-load ]
2022-12-14T00:09:57.5539792Z make: *** [slave.mk:1180: target/docker-dhcp-relay.gz-load] Error 1
2022-12-14T00:09:57.5540958Z make: *** Waiting for unfinished jobs....
The image had been built succeeded
2022-12-13T17:01:59.9046935Z [ finished ] [ target/docker-eventd.gz ]
2022-12-13T17:02:00.4947165Z [ building ] [ target/docker-dhcp-relay.gz ]
2022-12-13T17:02:00.6688627Z /sonic/dockers/docker-dhcp-relay/cli-plugin-tests /sonic
2022-12-13T17:02:41.1123955Z /sonic
2022-12-13T17:07:04.1786069Z [ finished ] [ target/docker-dhcp-relay.gz ]
But it was tagged by another value:
Obtained docker image lock for docker-dhcp-relay save
Tagging docker image docker-dhcp-relay-sonic:latest as docker-dhcp-relay:internal.65852159-11ad82a07a
Saving docker image docker-dhcp-relay:internal.65852159-11ad82a07a
Released docker image lock for docker-dhcp-relay save
Removing docker image docker-dhcp-relay-sonic:latest
Untagged: docker-dhcp-relay-sonic:latest
target/docker-dhcp-relay.gz
File /dpkg_cache/docker-dhcp-relay.gz-2ddfa01a109ca69b7621f1a-450bae36026d9dee62646f2.tgz saved in cache
[ CACHE::SAVED ] /dpkg_cache/docker-dhcp-relay.gz-2ddfa01a109ca69b7621f1a-450bae36026d9dee62646f2.tgz
How I did it
When the feature SONIC_CONFIG_USE_NATIVE_DOCKERD_FOR_BUILD not enabled, always save as the latest tag, not use the specify version.
The version is dynamic, it is changed when a new commit checked in, but the image of docker-dhcp-relay is not necessary to change.
Why I did it
Update ECMP calculator README file with new instructions how to run the calculator.
How I did it
Update README file.
How to verify it
Read README file.
docker-sonic-vs doesn't have the infra needed for the syslog rate limit
configuration, so it's not going to be rendering jinja templates to
overwrite /etc/rsyslog.conf. This also means that syslog messages would
get logged twice (because both the default /etc/rsyslog.conf file and
/etc/rsyslog.d/50-default.conf are telling it to log to syslog).
Therefore, keep the custom static /etc/rsyslog.conf file for docker-sonic-vs.
Fixessonic-net/sonic-swss#2570.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
How I did it
1、 demo driver will call the s3ip kernel framework interface
How to verify it
run the demo ,it will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
The user framework module complies with s3ip sysfs specification
How I did it
1、 create a s3ip_sysfs service
2、 the s3ip_sysfs service call the “s3ip_sysfs_tool.sh” to install kernel module and run s3ip_load.py
3、 s3ip_load.py will parse the s3ip_sysfs_conf.json configuration file and create /sys_switch/ directory
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide slot and switch_rootsysfs driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide SYSLED and watchdog driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide a sensor driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide a transceiver driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide a Fan driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide a PSU driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Platform interface doesn't provide all sensors and using it isn't effective
How I did it
Request sensors via http from BMC server and parse the result
How to verify it
Related daemon in pmon populates redis db, run this command to view the contents
Update sonic-utilities submodule pointer to include the following:
208824d YANG Validation for ConfigDB Updates: WARM_RESTART, SFLOW_SESSION, SFLOW, VXLAN_TUNNEL, VXLAN_EVPN_NVO, VXLAN_TUNNEL_MAP, MGMT_VRF_CONFIG, CABLE_LENGTH, VRF tables (#2526)
09b8dd1 [db_migrator] Remove import of swsssdk as it is not supported in master (#2544)
10eb5ba Support syslog rate limit configuration for containers and host (#2454)
ca9a020 [generate_dump] [Mellanox] Fix the duplicate dfw dump collection problem by adding symlinks (#2536)
92c7001 [config] Add check in config interface ip command to block if the interface is portchannel member (#2539)
e8130f5 [system-health] Improve code structure of system health CLIs (#2453)
00c01b3 Transceiver eeprom dom CLI modification to show output from TRANSCEIVER_DOM_THRESHOLD table (#2535)
42f51c2 sonic-utilities: Update config reload() to verify formatting of an input file (#2529)
a5e1e2b [GCU] Add RemoveCreateOnlyDependency Validator/Generator (#2500)
6411b52 [QoS] Introduce delay to the qos reload flow (#2503)
fce7ec3 Use github code scanning instead of LGTM (#2530)
91bd6de Change show kube command default value of insecure key to True (#2517)
c44c584 Add db_migrator_constants.py script to setup.py (#2534)
6a3238e [drop counters] Fix CLI script for unconfigured PGs (#2518)
263810b Update vrf add, del commands for duplicate/non-existing VRFs (#2467)
addae73 Port 202012 DB migration changes to newer branches (#2515)
2af8cfa [VXLAN]Fixing traceback in show remotemac when mac moves during command execution (#2506)
- Why I did it
Remove TODO comments which are no longer needed
- How I did it
Remove TODO comments which are no longer needed
- How to verify it
Only comment change
Add missing aggregate port_config.ini needed by sonic-mgmt
Concatenate the ASIC specific port_config.ini from device/arista/x86_64-arista_7800r3a_36d2_lc/Arista-7800R3A-36D2-C36/[01] to create the aggregate file.
- Why I did it
This optimization is needed for DPU SONiC. DPU SONiC runs a limited set of containers and teamd and radv containers are not part of them. Unlike the other containers, there was no possibility to disable teamd and radv containers compilation.
To reduce DPU SONiC compilation time and reduce the image size this commit adds the possibility to disable their compilation.
- How I did it
Two new configuration options are added to rules/config file:
INCLUDE_TEAMD
INCLUDE_ROUTER_ADVERTISER
By default to preserve the existing behavior both options are enabled. There are two ways to override them:
To change option value to "n" in rules/config file.
To override their value using SONIC_OVERRIDE_BUILD_VARS env variable:
SONIC_OVERRIDE_BUILD_VARS="SONIC_INCLUDE_TEAMD=y SONIC_INCLUDE_ROUTER_ADVERTISER=n"
- How to verify it
The default behavior is preserved. To verify it compile the image without overriding new options. Install the image and verify that both teamd and radv containers are present and running.
To verify the new options override them with "n" value. Compile and install image. Verify that no docker containers are present. Verify that SWSS can start without errors.
platform-daemon:
657a26de312d1eb61f15d13953ec1cd09634443 (HEAD, origin/master, origin/HEAD, master) [thermalctld] fix some redundant removal of state DB tables (#315)
56046dc36907c7e873911ef60e9193fe8717b12c Add new fields to status/dom_sensor/pm tables in STATE_DB for CMIS/C-CMIS (#304)
adcd69beb637aaf109573582a96bdeca82c8d1f0 Create TRANSCEIVER_DOM_THRESHOLD table in state DB (#320)
0573416ef546109849e0851d48ec1380426f7ef5 Remove the argument that is causing the xcvrd to crash (#318)
platform-common:
8f2dffb9d7708d05823462e9e643965103989d0d (HEAD, origin/master, origin/HEAD, master) Add get_transceiver_status and get_transceiver_pm to API interface (#315)
bf2ca02e06c93be9617cd0626049f7439b2192c1 [syseeprom] Remove the trailing space in the value of VENDOR_EXT field in the eepromTlvInfo decode (#333)
580357f740920671e9ca98dc0d1249537bddcf1d [Ci] Upgrade to bullseye and fix the branch reference issue (#331)
4f1722500b229fd3fd0b5e3a34686a00590af0a4 Use github code scanning instead of LGTM (#328)
ce9aacb628c5de7632e533deb008c012e0b9c40d EEPROM/DOM Info: The Compliance Code will show "unknown" by using FINISAR 10G LR XCVR (#319)
utilities:
208824d3202445e5d51c6ab6e5abeeb9c5483c1f (HEAD, origin/master, origin/HEAD, master) YANG Validation for ConfigDB Updates: WARM_RESTART, SFLOW_SESSION, SFLOW, VXLAN_TUNNEL, VXLAN_EVPN_NVO, VXLAN_TUNNEL_MAP, MGMT_VRF_CONFIG, CABLE_LENGTH, VRF tables (#2526)
09b8dd1333c84e9993234e017e2809d948c47c40 [db_migrator] Remove import of swsssdk as it is not supported in master (#2544)
10eb5ba8e3af26695eb4f00ddaf70b6be60a73b1 Support syslog rate limit configuration for containers and host (#2454)
ca9a02033f6609993a779d26a9da1b123a1115f6 [generate_dump] [Mellanox] Fix the duplicate dfw dump collection problem by adding symlinks (#2536)
92c70011307670aba6b73ef571f0e8d966ab62e3 [config] Add check in config interface ip command to block if the interface is portchannel member (#2539)
e8130f58bb66040a5c25435382e3c3df4bd0618b [system-health] Improve code structure of system health CLIs (#2453)
00c01b37c759283d3e8fa201ec94310b33ce7aab Transceiver eeprom dom CLI modification to show output from TRANSCEIVER_DOM_THRESHOLD table (#2535)
42f51c26d1d0017f3211904ca19c023b5d784463 sonic-utilities: Update config reload() to verify formatting of an input file (#2529)
a5e1e2b43e4c8fdb81307c49a8eb7b4db726758d [GCU] Add RemoveCreateOnlyDependency Validator/Generator (#2500)
6411b52e5e83837d731aed15b793d9df4277a47a [QoS] Introduce delay to the qos reload flow (#2503)
fce7ec32f5c07e9f017f15aa6790534f8596ef7b Use github code scanning instead of LGTM (#2530)
91bd6dee75d251dff72618b442376b537d6d3100 Change show kube command default value of insecure key to True (#2517)
c44c584f77577638460aaec78af1a3327aa8b4a5 Add db_migrator_constants.py script to setup.py (#2534)
6a3238e69062033159711ee6d4a3a8e39849f0c7 [drop counters] Fix CLI script for unconfigured PGs (#2518)
263810b25d12dc2435406d57245a113f7e9688c8 Update vrf add, del commands for duplicate/non-existing VRFs (#2467)
addae730177555c1a5d276e93b2610833604e5b8 Port 202012 DB migration changes to newer branches (#2515)
2af8cfa428af29551bdbdf3e44bbfe4fea4561b2 [VXLAN]Fixing traceback in show remotemac when mac moves during command execution (#2506)
Signed-off-by: Mihir Patel <patelmi@microsoft.com>
This feature caches all the deb files during docker build and stores them
into version cache.
It loads the cache file if already exists in the version cache and copies the extracted
deb file from cache file into Debian cache path( /var/cache/apt/archives).
The apt-install always installs the deb file from the cache if exists, this
avoid unnecessary package download from the repo and speeds up the overall build.
The cache file is selected based on the SHA value of version dependency
files.
Why I did it
How I did it
How to verify it
* 03.Version-cache - framework environment settings
It defines and passes the necessary version cache environment variables
to the caching framework.
It adds the utils script for shared cache file access.
It also adds the post-cleanup logic for cleaning the unwanted files from
the docker/image after the version cache creation.
* 04.Version cache - debug framework
Added DBGOPT Make variable to enable the cache framework
scripts in trace mode. This option takes the part name of the script to
enable the particular shell script in trace mode.
Multiple shell script names can also be given.
Eg: make DBGOPT="image|docker"
Added verbose mode to dump the version merge details during
build/dry-run mode.
Eg: scripts/versions_manager.py freeze -v \
'dryrun|cmod=docker-swss|cfile=versions-deb|cname=all|stage=sub|stage=add'
* 05.Version cache - docker dpkg caching support
This feature caches all the deb files during docker build and stores them
into version cache.
It loads the cache file if already exists in the version cache and copies the extracted
deb file from cache file into Debian cache path( /var/cache/apt/archives).
The apt-install always installs the deb file from the cache if exists, this
avoid unnecessary package download from the repo and speeds up the overall build.
The cache file is selected based on the SHA value of version dependency
files.
Signed-off-by: maipbui <maibui@microsoft.com>
Dependency: [PR (#12065)](https://github.com/sonic-net/sonic-buildimage/pull/12065) needs to merge first.
#### Why I did it
1. `eval()` - not secure against maliciously constructed input, can be dangerous if used to evaluate dynamic content. This may be a code injection vulnerability.
2. `subprocess()` - when using with `shell=True` is dangerous. Using subprocess function without a static string can lead to command injection.
3. `os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content.
4. `is` operator - string comparison should not be used with reference equality.
5. `globals()` - extremely dangerous because it may allow an attacker to execute arbitrary code on the system
#### How I did it
1. `eval()` - use `literal_eval()`
2. `subprocess()` - use `shell=False` instead. use an array string. Ref: [https://semgrep.dev/docs/cheat-sheets/python-command-injection/#mitigation](https://semgrep.dev/docs/cheat-sheets/python-command-injection/#mitigation)
3. `os` - use with `subprocess`
4. `is` - replace by `==` operator for value equality
5. `globals()` - avoid the use of globals()
Debian is shipping a systemd timer unit for logrotate, but we're also
packaging in a cron job, which means both of them will run, potentially
at the same time. Remove our cron file, and add an override to the
shipped timer file to have it be run every 10 minutes.
Fixes#12392.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
As described in detail in #12753, the current FRR patch 0009-ignore-route-from-default-table.patch is causing unwanted FRR/zebra error logs. This change gets rid of the error messages for routes from kernel default table while these routes are ignored in prefix encoding.
How I did it
This fix updates the original 0009 patch by checking if the routes are from table default before printing the error logs. The original patch checks the same condition and ignores the routes from table default in prefix encoding.
How to verify it
Follow the steps to repro as described in #12753.
Also verify the test case ipfwd/test_nhop_count.py no longer fails due to the error messages.
Signed-off-by: Stephen Xu <stexu@linkedin.com>
Why I did it
Initial implementation of Watchdog platform plugin for BMC-based boards
How I did it
How to verify it
Run platform_tests/test_reload_config.py
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com
Why I did it
Fixes#12575 and #12575
How I did it
In the PR sonic-net/sonic-platform-daemons#311 chassisd updates to CHASSIS_FABRIC_ASIC_INFO with the fabric asic info.
Updating the asic_status.py to read from the correct table.
How to verify it
test on chassis
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
*Replaced BRCM SDK's psample support flag(PSAMPLE_SUPPORT) with linux kernel psample module support config flag(CONFIG_PSAMPLE) in saibcm-modules.
*Replaced BUILD_PSAMPLE conditioanl check with CONFIG_PSAMPLE to build psample callback library(psample-cb.o), only if psample config is enabled in linux kernel.
*Cleaned up PSAMPLE_SUPPORT related commented code.
Signed-off-by: haris@celestica.com
Signed-off-by: haris@celestica.com
This change is to disable the pcie firmware check done by Broadcom SAI. The change is needed for the Arista platform x86_64-arista_7050cx3_32s; otherwise, the check will fail, blocking the initialization.
There was a pcie firmware check added in brcm SDK and certain Arista hardwares do not compliant with the check, so we added the disable_pcie_firmware_check originally for x86_64-arista_7060dx4_32. For x86_64-arista_7050cx3_32s, it was able to pass the check but some firmware change done in August made it fail.
Why I did it
SIGTERM takes more than 10 seconds to be processed, so psud is stopped by SIGKILL, this causes unexpected behavior since data base is not cleared
How I did it
Decorate get_presence api to cancel it on SIGTERM signal in order to avoid long processing.
How to verify it
test_pmon_psud_stop_and_start_status
test_pmon_psud_term_and_start_status
PR #12829 modified the docker tagging scheme such that optional docker
containers would be tagged with the SONiC image version. However, the
docker-image-load macro wasn't updated for these changes. Update it
here.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>