Commit Graph

122 Commits

Author SHA1 Message Date
Ze Gan
9f08f88a0d
[dpu]: Add DPU database service (#17161)
Sub PRs:

sonic-net/sonic-host-services#84
#17191

Why I did it
According to the design, the database instances of DPU will be kept in the NPU host.

Microsoft ADO (number only): 25072889

How I did it
To follow the multiple ASIC design, I assume a new platform environment variable NUM_DPU will be defined in the /usr/share/sonic/device/$PLATFORM/platform_env.conf. Based on this number, NPU host will launch a corresponding number of instances for the DPU database.

Signed-off-by: Ze Gan <ganze718@gmail.com>
2023-11-17 09:10:03 -08:00
Kebo Liu
31451295d5
Add special rsyslog filter for MSN2700 platform (#16684)
- Why I did it
Mellanox MSN2700 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely.

- How I did it
Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf

- How to verify it
run regression on the MSN2700 platform to make the error log will not be printed to the syslog.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-10-24 17:54:44 +03:00
Vadym Hlushko
3bd396043e
[buffers] Add 'create_only_config_db_buffers.json' file for the Mellanox devices (not MSFT SKU) (#16233)
* [buffers] Add create_only_config_db_buffers.json for MLNX devices (not MSFT SKU), inject it at the start of the swss docker

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

* [buffers] Align the sonic-device_metadata.yang

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

---------

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-10-03 08:35:57 -07:00
Vaibhav Hemant Dixit
07f8507911
[fast-reboot] Fix regression: set FAST_REBOOT state_db flag to support fast-reboot from older images (#16733)
Why I did it
Fix: #16699

Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR #15685

The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
2023-09-28 09:37:21 -07:00
abdosi
2ce04ab91d
[chassisd]: Add alternate to the bridge interface created on chassis supervisor. (#16505)
Add alternate name eth1-midplane to Linux bridge br1 created on supervisor on some chassis platforms.
See description here: #16504

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-09-23 00:27:21 -07:00
Yaqiang Zhu
76b7cb8b64
[dhcp_server] Add dhcp_server container (#14031)
Why I did it
Add dhcp_server ipv4 feature to SONiC.
HLD: sonic-net/SONiC#1282

How I did it
To be clarify: This container is disabled by INCLUDE_DHCP_SERVER = n for now, which would cause container not build.

Add INCLUDE_DHCP_SERVER to indicate whether to build dhcp_server container
Add docker file for dhcp_server, build and install kea-dhcp4 inside container
Add template file for dhcp_server container services.
Add entry for dhcp_server to FEATURE table in config_db.
How to verify it
Build image with INCLUDE_DHCP_SERVER = y to verify:

Image can be install successfully without crush.
By config feature state dhcp_server enabled to enable dhcp_server.
2023-09-11 09:15:56 -07:00
Vivek
0652991eb8
Run db_migrator for non first-time reboots (#16116)
- Why I did it
The recent change #15685 (comment) removed the db migration for non first reboots.
This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json
Port to older branches after #15685 is ported back

- How I did it
Re-introduce the logic to run the db_migrator on non-first boots

- How to verify it
Verified reboot and warm-reboot cases

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-08-22 18:36:38 +03:00
Vaibhav Hemant Dixit
e127701660
Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot (#15685)
* Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot
* Fix db-cli usage
* Handle same image warm-reboot and generalize handling of INIT flag
* Cover boot from ONIE case: set config init flag when minigraph, config_db are missing
* Handle case: first boot of SONiC
* Check for config init flag
* Simplify logic, and do not call db_migrator for same image reboot
2023-08-04 16:00:26 -07:00
xumia
a0b3ec2df6
Support FIPS DB configuration (#15632)
Why I did it
Support FIPS DB configuration
Design Doc: sonic-net/SONiC#1372

Work item tracking
Microsoft ADO (number only): 24411148
How I did it
Add the FIPS Yang model to make FIPS configurable in ConfigDB.

How to verify it
See TestPlan: sonic-net/sonic-mgmt#9092
Build the image and run the tests: sonic-net/sonic-mgmt#9091
2023-07-28 16:54:02 +08:00
Vaibhav Hemant Dixit
ddb3086620
Revert "Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464)" (#15684)
This reverts commit 9649a44470.
2023-07-06 17:34:35 -07:00
lixiaoyuner
ca29197184
Move k8s script to docker-config-engine (#14788)
Why I did it
To reduce the container's dependency from host system

Work item tracking
Microsoft ADO (number only):
17713469
How I did it
Move the k8s container startup script to config engine container, other than mount it from host.

How to verify it
Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container.

Signed-off-by: Yun Li <yunli1@microsoft.com>
Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>
2023-07-05 14:44:48 -07:00
Stepan Blyshchak
1ebdcda9e3
[nvidia] make sure shared storage with syncd is cleared on restarts (#14547)
Why I did it
Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways.
If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past.

NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker

How I did it
Implemented new service to clean the shared storage.

How to verify it
Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-06-28 15:26:49 -07:00
Vaibhav Hemant Dixit
9649a44470
Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464)
This reverts commit 02b17839c3.

Reverts #14933

The earlier commit caused a race condition that particularly broke cross branch warm upgrade.

Issue happens when db_migrator is still migrating the DB and finalizer is checking DB for list of components to reconcile.

If migration is not complete, finalizer get an empty list to wait for. Due to this, finalizer concludes warmboot (deletes system wide warmboot flag) and cause all the services to do cold restart.

ADO: 24274591
2023-06-16 13:58:38 -07:00
Vaibhav Hemant Dixit
02b17839c3
Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)
Why I did it
Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator:

Not finding anything, and resumes to incorrectly migrate every missing config
This is not expected. migration should happen after the old config is loaded and only new schema changes need migration.
Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None.
The reason for incorrect call is that:

database service starts db_migrator as part of startup sequence.
config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service.
Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed.
Fixed by:

Check if this is first time boot by checking pending_config_migration flag.
If pending_config_migration is enabled, then do not call db_migrator as part of database service startup.
Let database service start which triggers config-setup service to start.
Now call db_migrator after when config-setup service loads old-config/minigraph
2023-05-30 10:16:21 -07:00
Stephen Sun
9e56fea091
Temporary WA for the issue that asic_table.json can not be rendered (#13888)
- Why I did it
We suspect the issue #13791 is caused by redis server being temporarily unavailable during system initialization so we do not use -d in sonic-cfggen, for now, to avoid accessing redis server

- How I did it
Provide a string containing required json data when calling sonic-cfggen

- How to verify it
Manually test it

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-04-24 17:02:35 +03:00
Stephen Sun
152148fb81
Enhance the error message output mechanism (#14384)
#### Why I did it

Enhance the error message output mechanism during swss docker creating

#### How I did it

Capture the output to stderr of `sonic-cfggen` and output it using `echo` to make sure the error message will be logged in syslog.

#### How to verify it

Manually test
2023-04-07 14:23:35 -07:00
Marty Y. Lok
2c22d9affc
[Chassis][multiasic] Fix the sonic-db-cli core files issue on multiasic platform after the c++ implementation of sonic-db-cli (#13207)
Fixe #12047. After the c++ implementation of the sonic-db-cli, sonic-db-cli PING command tries to initialize the global database for all instances database starting. If all instance database-config.json are not ready yet. it will crash and generate core file. PR sonic-net/sonic-swss-common#701 only fix the crash and the process abortion. 

Signed-off-by: mlok <marty.lok@nokia.com>
2023-02-21 11:23:22 -08:00
Chun'ang Li
eea54717b8
Fix rsyslogd start failed cause by rsyslog.conf is emtpy. (#13669)
- Why I did it
In to-sonic and multi-asic KVM-test, pretest sometimes failed. Reason is rsyslogd process can not start in teamd container. Because rsyslog.conf is empty caused by sonic-cfggen execute failed

- How I did it
If sonic-cfggen -d execute failed, execute without -d because the template file has the default value.

- How to verify it
Build image and test it over 40 times, all passed pretest.

Signed-off-by: Chun'ang Li <chunangli@microsoft.com>
2023-02-06 16:38:04 +02:00
Sudharsan Dhamal Gopalarathnam
1ff0c0b685
[Mellanox][sai_failure_dump]Added platform specific script to be invoked during SAI failure dump (#13533)
- Why I did it
Added platform specific script to be invoked during SAI failure dump. Added some generic changes to mount /var/log/sai_failure_dump as read write in the syncd docker

- How I did it
Added script in docker-syncd of mellanox and copied it to /usr/bin

- How to verify it
Manual UT and new sonic-mgmt tests
2023-02-05 16:45:49 +02:00
Richard.Yu
a096363b48
[broadcom]: Set default SYNCD_SHM_SIZE for Broadcom XGS devices (#13297)
After upgrade to brcmsai 8.1, the sdk running environment (container) recommended with mininum memory size as below

TH4/TD4(ltsw) uses 512MB
TH3 used 300MB
Helix4/TD2/TD3/TH/TH 256 MB
Base on this requirement, adjust the default syncd share memory size and set the memory size for special ACISs in platform_env.conf file for different types of Broadcom ASICs.

How I did it
Add the platform_env.conf file if none of it for broadcom platform (base on platform_asic file)
Add the 'SYNCD_SHM_SIZE' and set the value

for ltsw(TD4/TH4) devices set to 512M at least (update the platform_env.conf)
for Td2/TH2/TH devices set to 256M
for TH3 set to 300M

verify

How to verify it
verify the image with code fix
Check with UT
Check on lab devices

On a problematic device which cannot start successfully
Run with the command
$ cat /proc/linux-kernel-bde
Broadcom Device Enumerator (linux-kernel-bde)
Module parameters:
        maxpayload=128
        usemsi=0
        dmasize=32M
        himem=(null)
        himemaddr=(null)
DMA Memory (kernel): 33554432 bytes, 0 used, 33554432 free, local mmap
No devices found
$ docker rm -f syncd
syncd
$ sudo /usr/bin/syncd.sh start
Cannot get Broadcom Chip Id. Skip set SYNCD_SHM_SIZE.
Creating new syncd container with HWSKU Force10-S6000
a4862129a7fea04f00ed71a88715eac65a41cdae51c3158f9cdd7de3ccc3dd31
$ docker inspect syncd | grep -i shm
            "ShmSize": 67108864,
                "Tag": "fix_8.1_shm_issue.67873427-9f7ca60a0e",
On Normal device
$ docker inspect syncd | grep -i shm
            "ShmSize": 268435456,
                "Tag": "fix_8.1_shm_issue.67873427-9f7ca60a0e"
change the config syncd_shm.ini to b85=128m

$ docker rm -f syncd
syncd
$ sudo /usr/bin/syncd.sh start
Creating new syncd container with HWSKU Force10-S6000
3209ffc1e5a7224b99640eb9a286c4c7aa66a2e6a322be32fb7fe2113bb9524c
$  docker inspect syncd | grep -i shm
            "ShmSize": 134217728,
                "Tag": "fix_8.1_shm_issue.67873427-9f7ca60a0e",
change the config under
/usr/share/sonic/device/x86_64-dell_s6000_s1220-r0/Force10-S6000/platform_env.conf
and run command

$ cat /usr/share/sonic/device/x86_64-dell_s6000_s1220-r0/platform_env.conf
SYNCD_SHM_SIZE=300m

$ sudo /usr/bin/syncd.sh start
Creating new syncd container with HWSKU Force10-S6000
897f6fcde1f669ad2caab7da4326079abd7e811bf73f018c6dacc24cf24bfda5
$  docker inspect syncd | grep -i shm
            "ShmSize": 314572800,
                "Tag": "fix_8.1_shm_issue.67873427-9f7ca60a0e",

Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
2023-01-30 20:23:03 -08:00
Junchao-Mellanox
3b3837a636
[containercfgd] Add containercfgd and syslog rate limit configuration support (#12489)
* [containercfgd] Add containercfgd and syslog rate limit configuration support

* Fix build issue

* Fix checker issue

* Fix review comment

* Fix review comment

* Update containercfgd.py
2022-12-08 08:58:35 -08:00
Aryeh Feigin
2c10ebb4fe
Use warm-boot infrastructure for fast-boot (#11594)
This PR should be merged together with the sonic-utilities PR (sonic-net/sonic-utilities#2286) and sonic-sairedis PR (sonic-net/sonic-sairedis#1100).

Use redis contents from dump file in fast-reboot.

Improve fast-reboot flow by utilizing the warm-reboot infrastructure.
This followes https://github.com/sonic-net/SONiC/blob/master/doc/fast-reboot/Fast-reboot_Flow_Improvements_HLD.md
2022-09-26 09:01:49 -07:00
Stepan Blyshchak
e662008f72
[services] kill container on stop in warm/fast mode (#10510)
- Why I did it
To optimize stop on warm boot.

- How I did it
Added kill for containers
2022-09-19 19:34:33 +03:00
Maxime Lorrillere
0a7dd50dcb
[Chassis][Voq]Configure midplane network on supervisor (#11725)
Multi-asic Docker instances are created behind Docker's default bridge
which doesn't allow talking to other Docker instances that are in the
host network (like database-chassis).

On linecards, we configure midplane interfaces to let per-asic docker
containers talk to CHASSIS_DB on the supervisor through internal chassis
network.

On the supervisor we don't need to use chassis internal network, but we
still need a similar setup in order to allow fabric containers to talk
to database-chassis
2022-09-15 17:23:41 -07:00
Zain Budhwani
6a54bc439a
Streaming structured events implementation (#11848)
With this PR in, you flap BGP and use events_tool to see the published events.
With telemetry PR #111 in and corresponding submodule update done in buildimage, one could run gnmi_cli to capture BGP flap events.
2022-09-03 07:33:25 -07:00
Sudharsan Dhamal Gopalarathnam
5dc4eb9693
[vs]Preventing ebtables cfg to be applied on vs (#11585)
*Preventing ebtables rules to be applied on KVM image. The ebtables rules in SONiC are added to prevent ARP as well as L2 forwarding to be blocked in linux kernel since the hardware will take care of the actual L2 forward. However this is not the case with KVM where linux needs to forward even L2 packets
2022-08-04 09:18:00 -07:00
andywongarista
88d0ce5ce8
Add gbsyncd container for broncos (#11154)
* Add docker-gbsyncd-broncos support
* Address review comments
* Add socket to gbsyncd
* Upgrade gbsyncd-broncos to bullseye
2022-07-18 10:57:27 +08:00
abdosi
0285bfe42e
[chassis] Fix issues regarding database service failure handling and mid-plane connectivity for namespace. (#10500)
What/Why I did:

Issue1: By setting up of ipvlan interface in interface-config.sh we are not tolerant to failures. Reason being interface-config.service is one-shot and do not have restart capability. 

Scenario: For example if let's say database service goes in fail state  then interface-services also gets failed because of dependency check but later database service gets restart but interface service will remain in stuck state and the ipvlan interface nevers get created.

Solution: Moved all the logic in database service from interface-config service which looks more align logically also since the namespace is created here and all the network setting (sysctl) are happening here.With this if database starts we recreate the interface.

Issue 2: Use of IPVLAN vs MACVLAN

Currently we are using ipvlan mode.  However above failure scenario is not handle correctly by ipvlan mode. Once the ipvlan interface is created and ip address assign to it and if we restart interface-config or database (new PR) service Linux Kernel gives error "Error: Address already assigned to an ipvlan device."  based on this:https://github.com/torvalds/linux/blob/master/drivers/net/ipvlan/ipvlan_main.c#L978Reason being if we do not do cleanup of ip address assignment (need to be unique for IPVLAN)  it remains in Kernel Database and never goes to free pool even though namespace is deleted. 

Solution: Considering this hard dependency of unique ip macvlan mode is better for us and since everything is managed by Linux Kernel and no dependency for on user configured IP address.

Issue3: Namespace database Service do not check reachability to Supervisor Redis Chassis   Server.

Currently there is no explicit check as we never do Redis PING from namespace to Supervisor Redis Chassis  Server. With this check it's possible we will start database and all other docker even though there is no connectivity and will hit the error/failure late in cycle

Solution: Added explicit PING from namespace that will check this reachability.

Issue 4:flushdb give exception when trying to accces Chassis Server DB over Unix Sokcet.

Solution: Handle gracefully via try..except and log the message.
2022-05-24 16:54:12 -07:00
Marty Y. Lok
23f9126f59
[VoQ][config] Multiasic Supervisor card fails to load config_db#.json in chassis when system is reboot (#10106)
Supervisor card fails to load config_db#.json in chassis when system reboot. 
This is an intermittent issue, fixes #10105
2022-05-09 11:06:11 -07:00
Yakiv Huryk
d9117d9411
[Mellanox][asan] add address sanitizer support for syncd (#10266)
Why I did it
To support address sanitizer for Mellanox syncd

How I did it
/var/log/asan is mapped for syncd container (the same as for swss)
container stop() has a timeout (60s) for syncd (the same as for swss)
This is so libasan has enough time to generate a report.
added ASAN's log path to Mellanox syncd supervisord.conf
added "asan: yes" to sonic_version.yml
How to verify it
Added artificial memory leaks
Compiled with ENABLE_ASAN=y
Installed the image on DUT
Rebooted the DUT
Verified that /var/log/asan/syncd-asan.log contains the leaks

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
2022-04-14 15:00:32 -07:00
byu343
f7a6553933
[docker-syncd]: Add optional shm-size to syncd container (#10516)
Why I did it
In the bringup of tomahawk4/trident4, we realized that such chips need a larger size of /dev/shm in syncd container, so we added the option --shm-size to the docker create for syncd. The default value for shm-size is 64m; after this change, people can add SYNCD_SHM_SIZE=128m to platform_env.conf to change it to 128m.

How to verify it
We verified that after this change, 1) on existing platforms without platform_env.conf, the size of /dev/shm in syncd container (df -h | grep shm) is still the default 64M; 2) after we add SYNCD_SHM_SIZE=128m to platform_env.conf, /dev/shm in syncd becomes 128M.
2022-04-09 10:47:18 -07:00
Oleksandr Ivantsiv
25a0ce5eb1
[asan] Add address sanitizer support. (#9857)
Implement infrastructure that allows enabling address sanitizer
for docker containers. Enable address sanitizer for SWSS container.

- Why I did it
To add a possibility to compile SONiC applications with address sanitizer (ASAN).
ASAN is a memory error detector for C/C++. It finds:
1. Use after free (dangling pointer dereference)
2. Heap buffer overflow
3. Stack buffer overflow
4. Global buffer overflow
5. Use after return
6. Use after the scope
7. Initialization order bugs
8. Memory leaks

- How I did it
By adding new ENABLE_ASAN configuration option.

- How to verify it
By default ASAN is disabled and the SONiC image is not affected.
When ASAN is enabled it inspects all allocation, deallocation, and memory usage that the application does in run time. To verify whether the application has memory errors tests that trigger memory usage of the application should be run. Ideally, the whole regression tests should be run. Memory leaks reports will be placed in /var/log/asan/ directory of SONiC host OS.

Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>
2022-02-09 13:29:18 +02:00
Marty Y. Lok
04a4b8dcb1
[multiasic][database]database.sh failed to create the database for namespace (#9502)
Why I did it
database.sh failed to create the database for namespace in multiasic platform.
The latest code Docker version 20.10.x, command "docker create" no longer takes optional "NET=" with empty value. Syntax error show with current docker create command in database.sh. Issue #9503

How I did it
Modify the docker_image_ctl.j2 to set default network setting NET="bridge" instead of empty for namespace database.
2021-12-13 10:17:05 -08:00
Marty Y. Lok
cb4c66ae98
[chassis][multiasic] fixed rsyslogd FATAL issue in the database container in multi-asic box (#8390)
Why I did it
Fix for issue #8389

How I did it
The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.Why I did it
Fix for issue #8389

How I did it
The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.

Signed-off-by: mlok <marty.lok@nokia.com>
2021-12-01 07:16:49 -08:00
Stephen Sun
b3ccef9c08
[Reclaim buffer] Common infrastructure update for reclaiming buffer (#9133)
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.

- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-11-24 15:00:23 +02:00
Saikrishna Arcot
33e4b7f90e Fix Python 3 syntax in SONiC container startup scripts
The common startup script used for SONiC containers is calling an inline
python command that uses Python 2 syntax, and thus errors out when run
with Python 3. Make this work with Python 3.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-11-10 15:27:22 -08:00
tjchadaga
8544147a70
Fix for additional intf flap during fast-reboot (#9166) 2021-11-08 15:21:11 -08:00
Lawrence Lee
fad5ec47b4 [mux]: Call write_standby from host only
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-10-15 09:59:59 -07:00
Lawrence Lee
5232647b33 [mux]: Make write_standby available on host
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>

[write_standby]: Cleanup and fix build

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-10-15 09:59:59 -07:00
Vaibhav Hemant Dixit
ee9250e8cc
Save DB dump after warm/fast reboot (#8803)
As a part of warmboot, redis database is dumped:
c97fe546e5/scripts/fast-reboot (L269)
However, this dump file is deleted, after it is loaded back into db post reboot.
The DB dump can be useful for debugging purpose, hence taking a backup of it can be useful.
Instead of deleting the dump, rename and keep the dump.
2021-09-23 23:53:22 -07:00
Stephen Sun
c895677507
Use predefined macro as vendor information (#8361)
#### Why I did it
Use a predefined variable to get vendor information when the swss docker container is created

#### How I did it
Use `{{ sonic_asic_platform }}` instead of `$SONIC_CFGGEN -y /etc/sonic/sonic_version.yml -v asic_type`

#### How to verify it
Manually test.
2021-08-16 00:36:48 -07:00
vdahiya12
498422968f
[pmon] create and mount firmware directory on PMON for firmware upgrade support on muxcable (#8283)
This PR creates a directory firmware on the HOST with the path /usr/share/sonic/firmware, as well as this is 
mounted on PMON container with the same path /usr/share/sonic/firmware. This is required for firmware 
upgrade support for muxcable as currently by design all Y-Cable API's are called by xcvrd. As such if CLI has 
to transfer a file to PMON we need to mount a directory from host to PMON just for getting the firmware files. 
Hence we require this change.

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2021-08-03 09:39:39 -07:00
賓少鈺
aa59bfeab7
[PDE]: introduce the SONiC Platform Development Env (#7510)
The PDE silicon test harness and platform test harness can be found in
src/sonic-platform-pdk-pde
2021-07-24 16:24:43 -07:00
vganesan-nokia
ffca17da0b
[voqsystemlagid] Fix for timing issue in setting system lag id boundary (#7911)
The voq system lag id boundary is set in redis-chassis. Changes include
setting this from database-chassis container. This fixes a timing issue
in finding datbase_config.json file from redis directory which is
created from database container. Since database container usually
starts after database-chassis container the existence of this file is
unreliable while running the command. Running the command under
database-chassis container makes sure that the database_config.json form
redis-chassis directory is guaranteed to be available and hence fixes the
timing issue.

Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
2021-06-30 09:57:08 -07:00
Kebo Liu
078e0e0410
mount 'mellanox' folder only instead of create each sub folder (#7830)
#### Why I did it

Following the discussion in another PR https://github.com/Azure/sonic-buildimage/pull/7708#discussion_r642933510 , since there will be multi subfolders under **/var/log/mellanox**, so we agreed to only mount this folder and the subfolders will be created afterward on demand.  

#### How I did it

during the syncd docker creation, only mount  folder **/var/log/mellanox**

#### How to verify it

build an Mellanox image and verify the related folder on the host and docker side.
2021-06-22 06:27:56 -07:00
vganesan-nokia
b313d4d092
[systemlag] Lag id boundary set for system lag (#6488)
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>

Changes for setting platfrom specific lag id boundary id in the chassis
app db. The platfrom specific lag id boundaries are supplied via
chassisdb.conf. The lag_id_start and lag_id_end boundary values sourced
from this file are set in chassis app db which will be used by lag id
allocator to allocate unique lag id in atomic fashion
2021-03-30 23:21:53 -07:00
Stepan Blyshchak
1f7d9e2698
[docker_img_ctl.j2] make tmpfs mounts optional and add ability to run container by image id (#6439)
- Why I did it
I made the docker_img_ctl.j2 applicable for more dockers (including application extensions dockers) by adding an option not to mount tmpfs on /tmp/ and /var/tmp/. In some applications /tmp/ is a different docker volume which can't be tmpfs.
Also, I added and ability to pass REPO[:TAG]|[@digest]/IMAGE_ID instead of just REPO name.

- How I did it
Modified docker_img_ctl.j2 and docker makefiles.

- How to verify it
Run it on the switch.
2021-03-16 17:03:12 +02:00
abdosi
cfa8fbbf1a
[baseimage]: Updates for Ebtables and support for multi-asic (#6542)
Following changes were done for ebtables:

- Support for Multi-asic platforms. Ebtable filters are installed in namespace for multi-asic and not host. On Single asic installed on  host.

- For Multi-asic platforms we don't want to install on host otherwise Namespace-to-Namespace communication does not happens since ARP Request are not forwarded.

- Updated to use text file to restore ebtables rules then the binary format. Rules are restore as part of Database docker init instead of rc.local

- Removed the ebtable service files for buster as not needed as filters are restored/installed as part of database docker init.
   All the binaries are pre-installed with ebtables* binary are same as ebatbles-legacy-* 

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2021-01-27 08:36:10 -08:00
judyjoseph
46b3bd5503
[teamd]: Increase wait timeout for teamd docker stop to clean Port channels. (#6537)
The Portchannels were not getting cleaned up as the cleanup activity was taking more than 10 secs which is default docker timeout after which a SIGKILL will be send.
Fixes #6199
To check if it works out for this issue in 201911 ? #6503

This issue is significantly seen in master branch compared to 201911 because the Portchannel cleanup takes more time in master. Test on a DUT with 8 Port Channels.

master

    admin@str-s6000-acs-8:~$ time sudo systemctl stop teamd
    real    0m15.599s
    user    0m0.061s
    sys     0m0.038s
Sonic 201911.v58

    admin@str-s6000-acs-8:~$ time sudo systemctl stop teamd
    real    0m5.541s
    user    0m0.020s
    sys     0m0.028s
2021-01-23 20:57:52 -08:00
Renuka Manavalan
ba02209141
First cut image update for kubernetes support. (#5421)
* First cut image update for kubernetes support.
With this,
    1)  dockers dhcp_relay, lldp, pmon, radv, snmp, telemetry are enabled
        for kube management
        init_cfg.json configure set_owner as kube for these

    2)  Each docker's start.sh updated to call container_startup.py to register going up
          As part of this call, it registers the current owner as local/kube and its version
          The images are built with its version ingrained into image during build

    3)  Update all docker's bash script to call 'container start/stop/wait' instead of 'docker start/stop/wait'.
         For all locally managed containers, it calls docker commands, hence no change for locally managed.
        
    4)  Introduced a new ctrmgrd service, that helps with transition between owners as  kube & local and carry over any labels update from STATE-DB to API server

    5)  hostcfgd updated to handle owner change

    6) Reboot scripts are updatd to tag kube running images as local, so upon reboot they run the same image.

   7) Added kube_commands.py to handle all updates with Kubernetes API serrver -- dedicated for k8s interaction only.
2020-12-22 08:01:33 -08:00