Why I did it
Fix: #16086
faketime package url expired. It breaks 201911 build.
Update package url.
Work item tracking
Microsoft ADO (number only): 24930879
Why I did it
[E1031] fix pca9548 initializes failed occasionally in stress test.
When failure happened, ismt i2c bus hang up and need power cycle to
recover it.
How I did it
Add 0.5s delay between setuping and configuring pca9548 i2c mux.
How to verify it
Reboot stress test at least 100 times without failure.
Why I did it
1GB disk size is not sufficient for latest 201811 broadcom raw image.
Work item tracking
How I did it
Increase raw image disk size from 1024MB to 1216MB
How to verify it
Initiate build for 'target/sonic-broadcom.raw' and verify that the build is successful.
Why I did it
To remove compression of the generated raw image during build.
Work item tracking
How I did it
Removed usage of 'gzip' on the generated raw image.
How to verify it
Initiate build for 'target/sonic-broadcom.raw' and verify that the size of the raw image generated is same as RAW_IMAGE_DISK_SIZE.
* [Build][201811] Fix the jessie mirror removed issue
* Fix build break for jessie apt key expiration. (#13328)
The GPG key used for Jessie's official repos has since expired, which means building 201911 images no longer works.
Fake the time to be before the expiry date.
* [build] Fix issues caused by docker.com gpg key update. (#14063)
Why I did it
docker.com's gpg key start to work from 2023-02-23. While debian.org's gpg key expired in 2022-11.
We used a walkaround for security checking for debian gpg keys. Now we need to exclude docker.com's gpg key.
How I did it
Update docker.com's gpg key without faketime.
Update others' gpg key with faketime '2022-11'
How to verify it
* Fix build break for jessie apt key expiration
---------
Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com>
Co-authored-by: Liu Shilong <shilongliu@microsoft.com>
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
Arista 7060 platform has a rare and unreproduceable PCIe timeout that could possibly be solved with increasing the switch PCIe timeout value. To do this we'll call a script for this platform to increase the PCIe timeout on boot-up.
No issues would be expected from the setpci command. From the PCIe spec:
"Software is permitted to change the value in this field at any
time. For Requests already pending when the Completion
Timeout Value is changed, hardware is permitted to use either
the new or the old value for the outstanding Requests, and is
permitted to base the start time for each Request either on when
this value was changed or on when each request was issued. "
How I did it
Add "platform-init" support in swss docker similar to how "hwsku-init" is called, only this would be for any device belonging to a platform. Then the script would reside in device data folder.
Additionally, add pciutils dependency to docker-orchagent so it can run the setpci commands.
How to verify it
On bootup of an Arista 7060, can execute:
lspci -vv -s 01:00.0 | grep -i "devctl2"
In order to check that the timeout has changed.
Why I did it
smartctl tool is available only in PMON docker. Hence, the tool may be not accessible incase PMON docker goes down.
Using iSMART_64 tool to fetch the SSD firmware version and device model information.
How I did it
Replacing smartctl with iSMART_64.
swss:
* c9efae8 2023-01-25 | [201811][portinit] Do not call GET on SAI_PORT_ATTR_SPEED when AUTONEG is enabled (#2484) (#2640) (github/201811) [Ying Xie]
* ab785d8 2020-10-02 | remove FDB entires of BridgePort before removing the BridgePort (#1451) [madhanmellanox]
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
#### Why I did it
To allow SSH connections from IPv6 addresses
Resolves https://github.com/Azure/sonic-buildimage/issues/7668
#### How I did it
In build_debian.sh, modify sshd_config file so as to enable listening for IPv6 connections
* Fix to improve hostname handling
If config_db.json is missing hostname entry, hostname-config.sh ends
up deleting existing entry too and hostname changes to default 'localhost'
* default hostname to 'sonic` if missing in config file
- Convert to Python 3
- Fix bug: `CORE_FILE_DIR` previously was set to `os.path.basename(__file__)`, which would resolve to the script name. Fix this by hardcoding to `/var/core/` instead
- Remove locally-define logging functions; use Logger class from sonic-py-common instead
Why I did it
Change the path of sonic submodules that point to "Azure" to point to "sonic-net"
How I did it
Replace "Azure" with "sonic-net" on all relevant paths of sonic submodules
Why I did it
Add the hardware reboot cause when the previous software reboot failed
How I did it
Check both hardware reboot cause and software reboot cause.
Add the hardware reboot as actual reboot cause
if any hardware reboot cause is available for any software reboot.
How to verify it
Perform reboots and verify the reboot-cause
Why I did it
To port DellEMC S6100 SSD upgrade status checker changes from master (based on #7289) to 201811 branch
Handle newer SSD firmware version (S210506G - 3IE devices)
Recover SSD upgrade state if in case ssd_fw_upgrade folder got deleted
Why I did it
Improve throughput and latency for 7260 deployments
How I did it
Update the dynamic threshold to 0 and ECN settings as 2mb/10mb/5%
How to verify it
With the new dir structure on 7260, updated the new alpha values and ecn settings in the appropriate files, loaded minigraph and verified that the new settings are applied
Added unit tests for rendering the qos template for 7260. Built sonic config engine wheel successfully
neethajohn added 6 commits 10 days ago
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
There is a need to select different mmu profiles based on deployment type
How I did it
There will be separate subfolders (RDMA-CENTRIC, TCP-CENTRIC, BALANCED) in each hwsku folder which contains deployment specific mmu and qos settings. SonicQosProfile attribute in the minigraph will be used to determine which settings to use. If that attribute is not present, the default settings that exist in the hwsku folder will be used
How to verify it
Manually verified by creating subfolders under DX010 folder and updated minigraph on the device with SonicQosProfile pointing to one of the subfolders. Verified by loading minigraph and confirming the mmu and ecn settings are as expected.
Unit tests have been updated to include the SonicQosProfile attribute in the minigraphs used for buffer and qos rendering. Since the dir structure does not exist now, code flow will take no action. Once the dir structure exists, the profile selection path will get executed but since all 3 subfolders have identical values, existing test cases should hold good.
Build sonic-config-engine python wheel with the changes and build was successful.
#### Why I did it
When too many user login concurrently and run commands, SONiC may kernel panic on some device which has very limited memory.
#### How I did it
Add j2 template for setup pam_limit plugin for limit SSH session per-user.
#### How to verify it
Manually validate the j2 template can generate correct config file.
#### Which release branch to backport (provide reason below if selected)
- [x] 201811
- [ ] 201911
- [ ] 202006
- [x] 202012
- [x] 202106
- [x] 202111
#### Description for the changelog
Add j2 template for setup pam_limit plugin for limit SSH session per-user.
#### A picture of a cute animal (not mandatory but encouraged)
Why I did it
This was an ask by Microsoft to provide:
7260 config.bcm file for hardware sku Arista-7260CX3-D92C16 (Named Arista-7260CX3-D96C16).
There are 16 100G uplinks:
Ethernet13-20/1
Ethernet45-52/1
All other ports are breakout to 2 50G ports.
How I did it
Copied existing Arista-7260CX3-D108C8 HWSKU and altered the bcm.config and port_config.ini files.
How to verify it
The new 100G ports do come up with a 201811 image using this HWSKU.
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
sshd overrides user password with a bad one, when pre-auth fails.
Refer PR #9123 for more details
How I did it
Manual cherry pick of PR #9123
How to verify it
Pick a user alias that has not logged into the switch yet
Add this alias to /etc/tacplus_user
Attempt to login as that user
Look for the error message in /var/log/syslog
e.g. "Feb 18 19:16:41.592191 sonic ERR sshd[5233]: auth fail: Password incorrect. user: user_xyz"
Why I did it
There is a small window between load & listen to config-DB. If TACACS config got updated during that gap, the listen will not show it, hence hostcfgd would miss it, until another update.
How I did it
porting PR #8223, which uses one shot timer to reload tacacs config.
Why I did it
Fix some unreliability seen on emmc device with some AMD CPUs
How I did it
Added a kernel parameter to add quirks to
It depends on a sonic-linux-kernel change to work properly but will be a no-op without it.
Description for the changelog
Add emmc quirks for Upperlake
* dhcp6relay: Save the dbgsym package into the target folder (#9013)
This makes it possible to install the debug symbols if needed. Also install
the package into the debug version of sonic-dhcp-relay container.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Package debugging and hardening for dhcpmon and dhcp6relay (#9862)
Enable dbgsym package for dhcpmon.
Allow CFLAGS and LDFLAGS from environment variables to be used
in the dhcp6relay build. This makes sure that the -O2 flag from
dpkg-buildflags gets used.
Finally, enable all hardening flags in dpkg-buildflags for
dhcp6relay and dhcpmon. The change from the default set of flags is that
during linking, immediate binding of symbols is done instead of lazy
binding.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* [201811][dhcp] update debian build rules
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com>
[201811] Check platform reboot cause to see if any reset happened during fast/warm-reboot
Why I did it
To recover syncd and swss from any cold reset during fast/warm-reboot
How I did it
Check platform reboot-cause to see if any cold reset happens for fast-reboot power up
How to verify it
Manual test
Why I did it
DHCPv6 Relay information will be stored in DHCP_RELAY table instead of VLAN table in the future.
How I did it
Change dhcp_relay docker files to parse through DHCP_RELAY to check for dhcpv6 status
How to verify it
Build dhcp_relay docker and check all dhcp_relay and dhcpmon are running properly
Which release branch to backport (provide reason below if selected)
* Add DHCPv6 minigraph parsing support
Co-authored-by: shlomibitton <60430976+shlomibitton@users.noreply.github.com>
Logrotate for wtmp and btmp files to fix size getting too large. (#8744)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
[201811][utilities][swss][snmpagent] advance sub module head
snmpagent
* 187aa10 2021-09-16 | [201811][RFC1213]: Initialize lag oid map in reinit_data (#233) (github/201811) [SuvarnaMeenakshi]
swss:
* 3503705 2021-09-05 | [201811][Cherry-pick] [acl mirror action] Mirror session ref count fix at acl rule attachment (#1898) (HEAD -> 201811, github/201811) [bingwang-ms]
utilities:
* f3f8667 2021-10-15 | [201811] disk_check.py: Allow remote user access when disk is read-only (#1873) (HEAD -> 201811, github/201811) [Renuka Manavalan]
* 6b351c9 2021-10-14 | [201811] Remove exec from platform_reboot_plugin call to handle any hang issue. (#1880) [Sujin Kang]
* d8d0461 2021-07-29 | [minigraph][port_config] Consume port_config.json while reloading minigraph (#1726) [Blueve]
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
[201811] Invoke disk check periodically (#8951)
* Invoke disk check periodically. (#7374)
Why I did it
Helps with periodic scan of disk for RO state.
If found, this script makes transient fix and raise error message.
Save DB dump after warm/fast reboot (#8913)
Back porting the master branch change - #8803
Save the redis DB dump after warm reboot.
[201811][swss] advance swss submodule head (#9049)
* e0b115a 2021-10-22 | [copp] add dhcpv6 copp rules (#1979) (HEAD -> 201811, github/201811) [Ying Xie]
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
[swssconfig] load dhcpv6 copp rules by default (#9047)
Why I did it
Need to enable DHCPv6 copp rule
How I did it
Add a separate DHCPv6 copp rule config file and load it during cold reboot.
How to verify it
cold reboot, and verify config being loaded and dhcpv6 rules got installed.
Signed-off-by: Ying Xie ying.xie@microsoft.com
[warmboot finalizer] load dhcpv6 copp rules when missing (#9048)
Why I did it
Need to enable DHCPv6 COPP rules.
How I did it
Load the separate DHCPv6 COPP rules after warm reboot if the rules are missing.
How to verify it
Warm reboot from an image doesn't have DHCPv6 COPP rules installed.
Warm reboot from an image have DHCPv6 COPP rules already installed.
In either case, the script did the right thing and only install the COPP rules if it is missing.
Signed-off-by: Ying Xie ying.xie@microsoft.com
Why I did it
Need to enable DHCPv6 COPP rules.
How I did it
Load the separate DHCPv6 COPP rules after warm reboot if the rules are missing.
How to verify it
Warm reboot from an image doesn't have DHCPv6 COPP rules installed.
Warm reboot from an image have DHCPv6 COPP rules already installed.
In either case, the script did the right thing and only install the COPP rules if it is missing.
Signed-off-by: Ying Xie ying.xie@microsoft.com
Why I did it
Need to enable DHCPv6 copp rule
How I did it
Add a separate DHCPv6 copp rule config file and load it during cold reboot.
How to verify it
cold reboot, and verify config being loaded and dhcpv6 rules got installed.
Signed-off-by: Ying Xie ying.xie@microsoft.com
* Invoke disk check periodically. (#7374)
Why I did it
Helps with periodic scan of disk for RO state.
If found, this script makes transient fix and raise error message.
Why I did it
201811 branch image build has been failing due to the certificate expiring: https://letsencrypt.org/docs/dst-root-ca-x3-expiration-september-2021. This issue so far only affect Jessie docker because it is using openssl 1.0.
How I did it
Remove the expired cert and refresh the certs bundle.
How to verify it
Build image.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Why I did it
serial-getty service exited in Dell S6100 device randomly.
How I did it
Added serial-getty to monit services.
How to verify it
Stop serial-getty in ssh session and check whether the service restarts or not.
Why I did it
In upgrade scenarios, where config_db.json is not carry forwarded to new image, it could be left w/o TACACS credentials.
Added a service to trigger 5 minutes after boot and restore TACACS, if /etc/sonic/old_config/tacacs.json is present.
How I did it
By adding a service, that would fire 5 mins after boot.
This service apply tacacs if available.
How to verify it
Upgrade and watch status of tacacs.timer & tacacs.service
You may create /etc/sonic/old_config/tacacs.json, with updated credentials
(before 5mins after boot) and see that appears in config & persisted too.
Why I did it
7050 S4Q31 mmu configuration is missing ALPM configurations, causing not enough memory reserved for routes. Orchagent crashes on a nightly testbed with 6400 route entries.
How I did it
Add the missing ALPM configurations.
How to verify it
Load the configuration on testbed and verified new configuration exists and no more crash.
Signed-off-by: Ying Xie ying.xie@microsoft.com
This PR contains the following changes
Original Arista-7050-QX-32S sku (32x40G ports) has been renamed to Arista-7050QX32S-Q32
Arista-7050-QX-32S is symlinked to Arista-7050QX-32S-S4Q31 (4x10G, 31x40G ports)
Signed-off-by: Neetha John <nejo@microsoft.com>
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Need proper MMU and Qos settings for Arista-7050QX-32S-S4Q31
How I did it
Updated the settings based on Arista-7050-QX-32S
Why I did it
Backport #7744
How to verify it
Ran sonic-cfggen on a minigraph and verified that interface of type DeviceMgmtLink has speed set in the PORT table from the bandwidth attribute in the minigraph
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
PG profile settings need to be aligned with Arista-7050-QX-32S
How I did it
Copy over the current settings from Arista-7050-QX-32S and define params for 10G and 1G speeds as well
Signed-off-by: Neetha John <nejo@microsoft.com>
* Support readonly vtysh for sudoers (#7383)
Why I did it
Support readonly version of the command vtysh
How I did it
Check if the command starting with "show", and verify only contains single command in script.
* Fix the type issue in rvtysh
Recent changes brought l2 vlan concept which does not have DHCP
clients behind them and so DHCP relay is not required. Also,
dhcpmon fails to launch on those vlans as their interfaces
lack IP addresses. This PR backposts #6527 that limits launch
of both DHCP relay and dhcpmon to L3 vlans only.
original-pr: #6527
singed-off-by: Tamer Ahmed tamer.ahmed@microsoft.com
When submitting a new official build for broadcom, vs, it prompts a error message, which says the job is not defined.
It was caused by the default option "[]", which is not empty, it is used as the jobGroups parameter.
Why I did it
Improve the version of the Pull Request build by changing the local branch name.
How I did it
Change the default branch name merge to [target_branch_name]-[pullrequestid].
How to verify it
For official build, the version is not changed.
For pull request build, the version as below:
Why I did it
Fix the boolean value case sensitive issue in Azure Pipelines
When passing parameters to a template, the "true" or "false" will have case sensitive issue, it should be a type casting issue.
To fix it, we change the true/false to yes/no, to escape the trap.
Support to override the job groups in the template, so PR build has chance to use different build parameters, only build simple targets. For example, for broadcom, we only build target/sonic-broadcom.bin, the other images, such as swi, debug bin, etc, will not be built.
Why I did it
Failed to build the centec-arm64 for no space in docker data root.
How I did it
Change to use the data root to the folder under /data.
See detail info about DOCKER_DATA_ROOT_FOR_MULTIARCH in the file Makefile.work.
How to verify it
Set the environment variable, then the /data used as docker root.
Why I did it
Fix the wrong build options and improve display name for some tasks
1. Fix the wrong build architecture for arm64 and armhf
2. Fix the build timeout parameter
Why I did it
To monitor the SSD health condition in DellEMC S6100 platform post upgrade.
A daemon is introduced to monitor the SSD every one hour.
To check for SSD status at boot time and at the time of cold-reboot.
All these changes are supported only for newer SSD firmware.
Porting changes from 201911 branch
Added a platform_reboot_pre_check script to prevent cold-reboot based on SSD status.
Depends on Azure/sonic-utilities#1557
#### Why I did it
- xcvrd crash was seen in latest 201811 images.
- For Dell S6100,API 2.0 uses poll mode while 1.0 was still using interrupt mode.
#### How I did it
- Modified get_transceiver_change_event in 1.0 to poll mode in all the related branches.
Why I did it
sonic snmp subagent build was failing recently.
How I did it
install python-arptable and psutil in sonic slave docker to address sonic snmp subagent build issue.
How to verify it
build 201811 image locally.
Signed-off-by: Ying Xie ying.xie@microsoft.com
Why I did it
for 201811 build image, swsssdk wheel package is not getting build due to redis==2.10.6 requirement issue. Notied that after upgrading pip to 20.3.3, we are experiencling this issue.
the issue
21:10:41 /sonic/src/sonic-py-swsssdk /sonic
21:10:41 running test
21:10:41 Searching for redis==2.10.6
21:10:41 Reading https://pypi.python.org/simple/redis/
21:10:41 Couldn't find index page for 'redis' (maybe misspelled?)
21:10:41 Scanning index of all packages (this may take a while)
21:10:41 Reading https://pypi.python.org/simple/
21:10:41 No local packages or download links found for redis==2.10.6
21:10:41 error: Could not find suitable distribution for Requirement.parse('redis==2.10.6')
21:10:41 [ FAIL LOG END ] [ target/python-wheels/swsssdk-2.0.1-py3-none-any.whl ]
21:10:41 slave.mk:422: recipe for target 'target/python-wheels/swsssdk-2.0.1-py3-none-any.whl' failed
21:10:41 make: *** [target/python-wheels/swsssdk-2.0.1-py3-none-any.whl] Error 1
21:10:43 Makefile.work:132: recipe for target 'target/sonic-aboot-broadcom.swi' failed
21:10:43 make[1]: *** [target/sonic-aboot-broadcom.swi] Error 2
21:10:43 make[1]: Leaving directory '/data/johnar/workspace/broadcom/buildimage-brcm-201811'
21:10:43 Makefile:6: recipe for target 'target/sonic-aboot-broadcom.swi' failed
So, what I have understood till now, if pip v20.3.3 is able to produce a wheel that is not installable because it raises pip._vendor.packaging.version.InvalidVersion or some error like that (resource- > pypa/pip#9206), it just raises an exception when building the wheel.
SO, this issue occurs when we pinned down pip to 20.3.3 in short.
As of now, there are two solutions mentioned below.
pin down pip to 20.3.3(which it is) and explicitly install packages.
pin down pip to 20.3.4(pip wheel now verifies the built wheel contains valid metadata, and can be installed by a subsequent pip install.)(resource-> https://pip.pypa.io/en/stable/news/) -- didn't try yet
How I did it
Install nose explicitly for mockredispy
Install redis==2.10.6 for swsssdk tests.
How to verify it
Run local build after removing all previously built dockers.
* [build] Fix the snmp docker build error. (#7452)
Issue is get_pip.py is moved to pip 21.1 (https://github.com/pypa/get-pip/commits/main) which is not compatible with 3.6.
Issue of pip itself is fixed as part of 21.1.1 in pip community (pypa/pip#9835).
However get-pip.py is still not updated to latest pip. Also get.pip.py does not support python 3.6 version explicitly (pypa/get-pip#88)
Step 15/29 : RUN curl https://bootstrap.pypa.io/get-pip.py | python3.6
---> Running in bece31f49267
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 1891k 100 1891k 0 0 9564k 0 --:--:-- --:--:-- --:--:-- 9600k
Traceback (most recent call last):
File "<stdin>", line 24298, in <module>
File "<stdin>", line 139, in main
File "<stdin>", line 115, in bootstrap
File "<stdin>", line 96, in monkeypatch_for_cert
File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/commands/__init__.py", line 9, in <module>
File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/cli/base_command.py", line 12, in <module>
File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/cli/cmdoptions.py", line 30, in <module>
File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/utils/hashes.py", line 2, in <module>
ImportError: cannot import name 'NoReturn'
The command '/bin/sh -c curl https://bootstrap.pypa.io/get-pip.py | python3.6' returned a non-zero code: 1
How I did:
Got the file from https://github.com/pypa/get-pip/tree/21.0 and added to the buildimage
pin pip to the previous release 21.0.1. (Similar is done in other public repos eg: grpc/grpc-java#8115)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Why I did it
201811 build was failing due to the newly added bcm config file key word was not on the permitted list
How I did it
add phy_an_lt_msft to permitted list.
How to verify it
Build device package.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Why I did it
To enable autonegotiation/link training on some Broadcom-based platforms (Arista 7060 and 7260, Celestica DX010)
How I did it
Add appropriate SOC property for enabling the feature to the Broadcom config files of appropriate platforms
Also convert line endings to UNIX format for one Celestica file
see below error:
+ sudo https_proxy= LANG=C chroot ./fsroot easy_install pip==20.3.3
Searching for pip==20.3.3
Reading https://pypi.python.org/simple/pip/
Couldn't find index page for 'pip' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.python.org/simple/
No local packages or working download links found for pip==20.3.3
error: Could not find suitable distribution for Requirement.parse('pip==20.3.3')
How I fix:
Install python-pip via apt-get
Pin the version to 20.3.3
Master has same changes.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Basically mlnx-fw-upgrade.sh is used in two places:
1. https://github.com/Azure/sonic-buildimage/blob/201811/files/scripts/syncd.sh#L109
```bash
/usr/bin/mst start
/usr/bin/mlnx-fw-upgrade.sh
/etc/init.d/sxdkernel start
/sbin/modprobe i2c-dev
```
2. https://github.com/Azure/sonic-buildimage/blob/201811/device/mellanox/x86_64-mlnx_msn2700-r0/platform_reboot#L32
```bash
ParseArguments "$@"
${FW_UPGRADE_SCRIPT} --upgrade --verbose
EXIT_CODE="$?"
```
In first case the `stdout` is redirected to `syslog` directly by `systemd`.
Thus, the `syslog` logger is only required in second case.
#### Why I did it
* To improve ASIC/CPLD FW upgrade logging
* To improve CPLD upgrade time
#### How I did it
* Added `syslog` logger support
* Replaced `_pciconf0` -> `_pci_cr0` to reduce CPLD upgrade time
#### How to verify it
1. mlnx-fw-upgrade.sh --upgrade
Backport of https://github.com/Azure/sonic-buildimage/pull/7031 to the 201811 branch
#### Why I did it
To enable parsing the `AutoNegotiation` element from the LinkMetadata section of minigraph file
#### How I did it
Parse the value `AutoNegotiation` element from the `LinkMetadata` section of minigraph file. If the element is present, an `autoneg` key will be added to the port in the `PORT` table of Config DB with a value of either `0` or `1`
If an `autoneg` value is present in port_config.ini, the value from the minigraph will take precedence, overriding that value.
Also remove `AutoNegotiation` and `EnableAutoNegotiation` elements from the `DeviceInfo` section, as we will use this data in the `LinkMetadata` section to determine whether to enable auto-negotiation for a port.
Why I did it
The S6000 devices, the cold reboot is abrupt and it is likely to cause issues which will cause the device to land into EFI shell. Hence the platform reboot will happen after graceful unmount of all the filesystems as in S6100.
How I did it
Moved the platform_reboot to platform_reboot_override and hooked it to the systemd shutdown services as in S6100.
Fixed the "/host unmount failed" issue as well in 201811.
How to verify it
Issue "reboot" command to verify if the reboot is happening gracefully.
#### Why I did it
- The iSMT SMBUS I2c bus number conflicts in different kernel versions.
#### How I did it
- Add I2cbus number detector for iSMT bus
- Replace iSMT bus number in fancontrol config
[CRM] Safety check for division by 0 (#1569)
[crm]: Typecast to unit64_t to avoid divide by 0 during overflow (#1550)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Updated commit hash pointer in the relevant Makefile for the repository which contains SDK packages.
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
With the release of pip21.0 (https://pypi.org/project/pip/#history) on branch
201811 stretch build is failing with below error logs:
As per https://pypi.org/project/pip/ pip21.0 does not not support python2
from Jan 2021. To fix this tag the pip to 20.3.3 version which was being used last
and is working fine.
Signed-off-by: Guohan Lu <lguohan@gmail.com>
fixesAzure/sonic-utilities#1389
With the recent changes in sudoer files. The show commands fails for the read-only users.
The problem here is the 'docker ps' is failing in the function [get_routing_stack()](8a1109ed30/show/main.py (L54)) therefore all the CLI commands are failing.
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
Fix#119
when parallel build is enable, multiple dpkg-buildpackage
instances are running at the same time. /var/lib/dpkg is shared
by all instances and the /var/lib/dpkg/updates could be corrupted
and cause the build failure.
the fix is to use overlay fs to mount separate /var/lib/dpkg
for each dpkg-buildpackage instance so that they are not affecting
each other.
Signed-off-by: Guohan Lu <lguohan@gmail.com>
when parallel build is enabled, both docker-fpm-frr and docker-syncd-brcm
is built at the same time, docker-fpm-frr requires swss which requires to
install libsaivs-dev. docker-syncd-brcm requires syncd package which requires
to install libsaibcm-dev.
since libsaivs-dev and libsaibcm-dev install the sai header in the same
location, these two packages cannot be installed at the same time. Therefore,
we need to serialize the build between these two packages. Simply uninstall
the conflict package is not enough to solve this issue. The correct solution
is to have one package wait for another package to be uninstalled.
For example, if syncd is built first, then it will install libsaibcm-dev.
Meanwhile, if the swss build job starts and tries to install libsaivs-dev,
it will first try to query if libsaibcm-dev is installed or not. if it is
installed, then it will wait until libsaibcm-dev is uninstalled. After syncd
job is finished, it will uninstall libsaibcm-dev and swss build job will be
unblocked.
To solve this issue, _UNINSTALLS is introduced to uninstall a package that
is no longer needed and to allow blocked job to continue.
Signed-off-by: Guohan Lu <lguohan@gmail.com>
sonic-slave tag only allows all lower case. In case the user
name is mixed case, we need to change user name to all lower case.
Signed-off-by: Guohan Lu <lguohan@gmail.com>
87e1a36 2020-04-16 | Do not set PG to Buffer porfile mapping again if already exist. (#1261) (HEAD -> 201811, origin/201811) [abdosi]
Signed-off-by: Guohan Lu <lguohan@gmail.com>
backport c4b5b002c3
make swss build depends only on libsairedis instead of syncd. This allows to build swss without depending
on vendor sai library.
Currently, libsairedis build also buils syncd which requires vendor SAI lib. This makes difficult to build
swss docker in buster while still keeping syncd docker in stretch, as swss requires libsairedis which also
build syncd and requires vendor to provide SAI for buster. As swss docker does not really contain syncd
binary, so it is not necessary to build syncd for swss docker.
[submodule]: update sonic-sairedis
* 9a66890 2020-06-28 | [build]: add option to build without syncd (HEAD -> 201811, origin/201811) [Guohan Lu]
Signed-off-by: Guohan Lu <lguohan@gmail.com>
Why I did it
During upgrade, if config is loaded from minigraph, it would miss TACACS credentials. This leads to device losing remote user accessibility
- How I did it
During update graph, when config is loaded from minigraph, look for TACACS credentials back-up and load that if available
- How to verify it
Remove /etc/sonic/config-db.json, save TACACS credentials in /etc/sonic/tacacs.json and do a Image upgrade. Do image upgrade and boot into new image. Verify remote user access is available.
NOTE: This change is available in master via PR #6285
Haliburton needed watchdog daemon to monitor the basic health of a machine. If something goes wrong, such as a crashing program overloading the CPU, or no more free memory on the system, watchdog can safely reboot the machine,
* [SAI] Add PFC pause duration counters in microseconds
**- Why I did it**
To add PFC pause duration counters in microseconds
**- How I did it**
Updated SAI to version 1.14.3
**- How to verify it**
**- Description for the changelog**
[Mellanox] Update SAI to version 1.14.3
This PR adds the changes in #6198 to 201811 branch to support warm-reboot image upgrade for kvm images.
The sai.profile file in kvm images overrides the warmboot file with path /var/cache/sai_warmboot.bin. Since the directory /var/cache is not mounted in syncd, it will be cleared in an image upgrade, the warm-reboot image upgrade will fail if the file is put in the directory.
Remove the path that overrides the default path. The warmboot file path will then be the default value /var/warmboot/sai-warmboot.bin. Since /var/warmboot/ is mounted by /host/warmboot/ in the host, it could survive an image upgrade.
Since DOCKER_SYNCD_VS is no longer being used, the mount option does not properly mount the warmboot file directory. Fix the mount option so that the directory is properly mounted.
During platform deinitialization, dell_ich is not removed properly and when we do initialize s6100 platform, ICH driver sysfs attributes are not attached. Because of this, get_transceiver_change_event returns error and this leads xcvrd to crash.
- What I did
Ported a fix from libteam master to our master.
Fixes#4070Fixes#3649
- How I did it
Applied patch jpirko/libteam@c723737 from upstream.
- How to verify it
Build image for your DUT and warm-reboot your DUT 10 times. Check that all PortChannels are up and no error messages in teamd.log
To address issue #5525
Explicitly control the grub installation requirement when it is needed.
We have scenario where configuration migration happened but grub
installation is not required.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [platform/cel-dx010]: add gpio init for fan direction
* [platform/cel-dx010]: remove invalid code on fancontrol service
* [platform/cel-dx010]: modify fancontrol service permission
* [platform/cel-dx010]: install fancontrol in pmon
Printing both snapshot and current counter sets will make it easier to pinpoint
which message type(s) is/are not being relayed. This PR prints both counter sets.
Also, this PR defines gnu11 as a C standard to compile with in order to avoid
making changes when porting to 201811 branch.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
When BGP routes are missing, DHCP packets get relayed over mgmt
interface. This results in dhcpmon alerting that DHCP packets are
not being relayed. This is PR include mgmt interface as uplink
device, and so, if DHCP packet gets relayed over mgmt interface,
regular dhcpmon alert will not be issues. Instead, dhcpmon will
check the mgmt interface counts and issue a separate alert regarding
packets travelling through mgmt network.
In addition, this PR includes the following enhancements:
1. Add SIGUSR1 handler that prints out current packet counts
2. Increase alert grace window to 3 minutes from currently 2 minutes
3. Time is now computed more accurately
4. Print vlan name before counters
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
* [201811][swss-common] advance swss-common sub module head
- Fix SubscriberStateTable::hasCachedData formula for a timing risk (#379)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Fix build of the unit test of SubscriberStateTable (#383)
When BGP routes are missing, DHCP packets get relayed over mgmt
interface. This results in dhcpmon alerting that DHCP packets are
not being relayed. This is PR include mgmt interface as uplink
device, and so, if DHCP packet gets relayed over mgmt interface,
regular dhcpmon alert will not be issues. Instead, dhcpmon will
check the mgmt interface counts and issue a separate alert regarding
packets travelling through mgmt network.
In addition, this PR includes the following enhancements:
1. Add SIGUSR1 handler that prints out current packet counts
2. Increase alert grace window to 3 minutes from currently 2 minutes
3. Time is now computed more accurately
4. Print vlan name before counters
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
The first partition starting point was changed to be 1M as part of this
commit: 6ba2f97f1e. On systems that are misaligned before conversion
(partition start is the first sector), the relica partition that is
left in the first MB can cause problems in Aboot and result in corruption
of the filesystem on the new aligned partition.
Zeroing this old relica makes sure that there is nothing left of the old
partition lying around. There won't be any risk of having Aboot corrupt
the new filesystem because of the old relica.
Signed-off-by: Baptiste Covolato <baptiste@arista.com>
Fix an error which causes bgpcfgd crash on invalid ip address. Before the fix we had an issue here. When either loopback ipv4 or ipv6 addresses were already set and bgpcfgd received another "SET" message for already set ip loopback address, bgpcfgd will send syslog message about ambiguous ip address (despite the fact that the address is good) and crash of bgpcfgd. With this change this behavior is changed: if we receive ip address and this ip address is already set, bgpcfgd will send this message to the syslog and return from the handler.
If a device had a master or 201911 image then installed a 201811 image, it could result in a prefdl that was not properly processed by 201811 Arista code.
This is a commit that was on 201911 and master branch.
Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
sonic-quagga using utility from master branch of sonic-buildimage. I had to create 201811 branch in sonic-quagga which could work with 201811 branch of sonic-buildimage.
Fix the method get_transceiver_change_event to abide by the function description, return True status and use timeout in milliseconds.
Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
[aclorch] Use IPv6 Next Header internally for protocol number on MLNX platform (#1343)
Add/Del lag_name_map item according to lag adding and removing (#1124)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Update fancontrol service for Seastone-DX010/E1031 device to support hysteresis temperature threshold and difference config for each unit fan direction type (B2F/F2B); follow master branch
FDB/ARP/Default routes files are deleted after swssconfig. This
makes debugging/validation of device conversion hard. This PR
saves those files in order to facilitate debugging of device conversion.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
While migrating to SONiC 20181130, identified a couple of issues:
1. union-mount needs /host/machine.conf parameters for vendor specific checks : however, in case of migration, the /host/machine.conf is extracted from ONIE only in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/platform/rc.local#L127.
2. Since grub.cfg is updated to have net.ifnames=0 biosdevname=0, 70-persistent-net.rules changes are no longer required.
Don't limit iptables connection tracking to TCP protocol; allow connection tracking for all protocols. This allows services like NTP, which is UDP-based, to receive replies from an NTP server even if the port is blocked, as long as it is in reply to a request sent from the device itself.
Found another syncd timing issue related to clock going backwards.
To be safe disable the ntp long jump.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- Broadcom SAI 3.5 GA code drop on 20200608.
Changes:
- CS9533198
- CS10283709
- CS00009716645
- CS00010389861
- CS00010406122
- CS00010503275
- Addressed a few memory leak issues.
- Addressed an array memory allocation issue.
- Addressed assert during SER handling.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
I found that with IPv4Network types, calling list(ip_ntwrk.hosts()) is reliable. However, when doing the same with an IPv6Network, I found that the conversion to a list can hang indefinitely. This appears to me to be a bug in the ipaddress.IPv6Network implementation. However, I could not find any other reports on the web.
This patch changes the behavior to call next() on the ip_ntwrk.hosts() generator instead, which returns the IP address of the first host.
**- Why I did it**
When I tested auto-restart feature of swss container by manually killing one of critical processes in it, swss will be stopped. Then syncd container as the peer container should also be
stopped as expected. However, I found sometimes syncd container can be stopped, sometimes
it can not be stopped. The reason why syncd container can not be stopped is the process
(/usr/local/bin/syncd.sh stop) to execute the stop() function will be stuck between the lines 164 –167. Systemd will wait for 90 seconds and then kill this process.
164 # wait until syncd quit gracefully
165 while docker top syncd$DEV | grep -q /usr/bin/syncd; do
166 sleep 0.1
167 done
The first thing I did is to profile how long this while loop will spin if syncd container can be
normally stopped after swss container is stopped. The result is 5 seconds or 6 seconds. If syncd
container can be normally stopped, two messages will be written into syslog:
str-a7050-acs-3 NOTICE syncd#dsserve: child /usr/bin/syncd exited status: 134
str-a7050-acs-3 INFO syncd#supervisord: syncd [5] child /usr/bin/syncd exited status: 134
The second thing I did was to add a timer in the condition of while loop to ensure this while loop will be forced to exit after 20 seconds:
After that, the testing result is that syncd container can be normally stopped if swss is stopped
first. One more thing I want to mention is that if syncd container is stopped during 5 seconds or 6 seconds, then the two log messages can be still seen in syslog. However, if the execution
time of while loop is longer than 20 seconds and is forced to exit, although syncd container can be stopped, I did not see these two messages in syslog. Further, although I observed the auto-restart feature of swss container can work correctly right now, I can not make sure the issue which syncd container can not stopped will occur in future.
**- How I did it**
I added a timer around the while loop in stop() function. This while loop will exit after spinning
20 seconds.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Since the introduction of VRF, interface-related tables in ConfigDB will have multiple entries, one of which only contains the interface name and no IP prefix. Thus, when iterating over the keys in the tables, we need to ignore the entries which do not contain IP prefixes.
Modified caclmgrd behavior to enhance control plane security as follows:
Upon starting or receiving notification of ACL table/rule changes in Config DB:
1. Add iptables/ip6tables commands to allow all incoming packets from established TCP sessions or new TCP sessions which are related to established TCP sessions
2. Add iptables/ip6tables commands to allow bidirectional ICMPv4 ping and traceroute
3. Add iptables/ip6tables commands to allow bidirectional ICMPv6 ping and traceroute
4. Add iptables/ip6tables commands to allow all incoming Neighbor Discovery Protocol (NDP) NS/NA/RS/RA messages
5. Add iptables/ip6tables commands to allow all incoming IPv4 DHCP packets
6. Add iptables/ip6tables commands to allow all incoming IPv6 DHCP packets
7. Add iptables/ip6tables commands to allow all incoming BGP traffic
8. Add iptables/ip6tables commands for all ACL rules for recognized services (currently SSH, SNMP, NTP)
9. For all services which we did not find configured ACL rules, add iptables/ip6tables commands to allow all incoming packets for those services (allows the device to accept SSH connections before the device is configured)
10. Add iptables rules to drop all packets destined for loopback interface IP addresses
11. Add iptables rules to drop all packets destined for management interface IP addresses
12. Add iptables rules to drop all packets destined for point-to-point interface IP addresses
13. Add iptables rules to drop all packets destined for our VLAN interface gateway IP addresses
14. Add iptables/ip6tables commands to allow all incoming packets with TTL of 0 or 1 (This allows the device to respond to tools like tcptraceroute)
15. If we found control plane ACLs in the configuration and applied them, we lastly add iptables/ip6tables commands to drop all other incoming packets
Dynamic threshold setting changed to 0 and WRED profile green min threshold set to 250000 for Tomahawk devices
Changed the dynamic threshold settings in pg_profile_lookup.ini
Added a macro for WRED profiles in qos.json.j2 for Tomahawk devices
Necessary changes made in qos.config.j2 to use the macro if present
Signed-off-by: Neetha John <nejo@microsoft.com>
* [dhcpmon] Filter DHCP O/A Messages of Neighboring Vlans
This code fixes a bug where two or more vlans exist. Cross contamination
happens for DHCP packets Offer/Ack when received on shared northbound links.
The code filters out those packet based on dst IP equal Vlan loopback IP.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
* Include platform info in name.
Get SONiC Version as parameter and use
Make additional tag as optional.
Avoid repetitions by using function.
* Per review comments, make SONIC_VERSION optional and added some comments.
* 1) Added additional params are optional
2) Handle DOCKER_IMAGE_TAG only if given
3) Use BUILD_NUMBER only if SONIC_VERSION not given
4) Tag with SONIC_VERSION if given.
Current behavior is not changed, unless SONIC_VERSION is given.
* Update per review comments
1) Added new args with options
2) Handle PORT possible being empty
3) Exhibit new behavior only if both version & platform are given.
* Drop redundant quotes
* fixes an issue when /host/warmboot/issu_bank.txt is empty/corrupted
switch is not able to over come this and enters continuos reload/reboot
failure.
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
Submodule src/sonic-utilities f431510ae..d7e8f84cf:
Fix issue of fields overwritten before display (#863)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
Co-authored-by: Guohan Lu <lguohan@gmail.com>
- What I did
Add configuration to avoid ntpd from panic and exit if the drift between new time and current system time is large.
- How I did it
Added "tinker panic 0" in ntp.conf file.
- How to verify it
[this assumes that there is a valid NTP server IP in config_db/ntp.conf]
Change the current system time to a bad time with a large drift from time in ntp server; drift should be greater than 1000s.
Reboot the device.
Before the fix:
3. upon reboot, ntp-config service comes up fine, ntp service goes to active(exited) state without any error message. This is because the offset between new time (from ntp server) and the current system time is very large, ntpd goes to panic mode and exits. The system continues to show the bad time.
After the fix:
3. Upon reboot, ntp-config comes up fine, ntp services comes up from and stays in active (running) state. The system clock gets synced with the ntp server time.
admin@sonic:~$ sudo hw-management-wd.sh
Usage: hw-management-wd.sh start [timeout] | stop | tleft | check_reset | help
start - start watchdog
timeout is optional. Default value will be used in case if it's omitted
timeout provided in seconds
stop - stop watchdog
tleft - check watchdog timeout left
check_reset - check if previous reset was caused by watchdog
Prints only in case of watchdog reset
help -this help
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
Submodule src/sonic-utilities e9747899a..f431510ae:
> [201811][intfutil] set speed to 0 when interface speed is not available (#840)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
We believe that the supervisord issue in face of clock rolling backwards
has been addressed. Therefore reverting change 2598 to allow ntp sync
to right clock at the start up time.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [bgpcfgd]: Generate set src configuration dynamically
Sometimes zebra starts faster then swss configures Loopback0
In this case "set src" inside of the route map will not be
inserted to the configuration because zebra doesn't see Loopback
ips in the list of available ip addresses.
I've added extra logic to push the "set src" configuration only
when Loopback has been configured by swss.
* [interfaces-config.sh] Flush the loopback interface before configure it
Without this, you may end up with more and more ip addresses
on loopback interface after you change the loopback ip and do config reload
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
Increased netlink socket receive buffer size for zebra. Otherwise we receive following messages sometimes:
zebra[60]: netlink-listen recvmsg overrun: No buffer space available
If sonic-cfggen is passed the -m argument (to load the minigraph file) along with one or more -j <json_file> arguments, load the JSON files before loading the minigraph file.
This ensures that the init_cfg.json file is loaded before the minigraph, therefore the minigraph can override any default configuration options specified in init_cfg.json. Currently, the behavior is reversed.
Note: This is not an issue if loading loading multiple JSON files, because sonic-cfggen loads them in the left-to-right order they were specified on the command line, therefore providing flexibility for loading JSON files in a specific order. As long as init_cfg.json is specified before config_db.json, the values specified in config_db.json will take precedence.
Submodule src/sonic-sairedis aa092506f..c61d15947:
> [201811][SAI] move SAI sub module head to v1.4.3 (#567)
> [201811][sai] advance SAI submodule (#566)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Submodule src/sonic-utilities 49ab6b1f8..e9747899a:
> [reboot] make sure the reboot happens even if platform reboot failed (#819)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Fix the issue of incorrectly skipping the convertfs hook when fast-reboot from EOS, by adding an extra kernel cmdline param "prev_os" to differentiate fast-reboot from EOS and from SONiC.
This is because we still do disk conversion for fast reboot from eos to sonic, like format the disk.
* Add new CIG device CS6436-54P and CS5435-54P, also update code for CS6436-56P
* Update port_config.ini
update port config file started from "Ethernet1"
* Update port_config.ini
Update port config file started from "Ethernet1"
* Update port_config.ini
Update the index column in the port config file
* Update sfputil.py
Update sfputil python file as the port index changed
* Update port_config.ini
Update the index column in the port config file
* Update sfputil.py
Update sfputil python file as the port index changed
Before the fix:
lldpcli configure ports Ethernet96 lldp portidsubtype local 'Eth1/1' description 50G|sonic1|Eth1/3/2
bash: sonic1: command not found
bash: Eth1/3/2: No such file or directory
After fix:
lldpcli configure ports Ethernet96 lldp portidsubtype local 'Eth1/1' description '50G|sonic1|Eth1/3/2'
run successfully.
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
Force "lo" interface down in interfaces-config.sh to prevent interface-config.service from failing with the following error:
```
-- The result is failed.
systemd[1]: networking.service: Unit entered failed state.
systemd[1]: networking.service: Failed with result 'exit-code'.
interfaces-config.sh[29232]: Job for networking.service failed because the control process exited with error code.
interfaces-config.sh[29232]: See "systemctl status networking.service" and "journalctl -xe" for details.
interfaces-config.sh[29232]: ifdown: interface lo not configured
interfaces-config.sh[29232]: RTNETLINK answers: File exists
interfaces-config.sh[29232]: ifup: failed to bring up lo
systemd[1]: interfaces-config.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: Failed to start Update interfaces configuration.
-- Subject: Unit interfaces-config.service has failed
```
Failure to bring down the interface will result in a failure to subsequently bring the interface back up.
Submodule src/sonic-utilities 7a265b85..23cc3094:
> [neighbor advertiser] remove http endpoint access (#792)
> [fast/warm reboot] watchdog log message mentions the right reboot type (#791)
> [fdbshow][nbrshow] Print interface OID in lieu of name if there is no OID->interface name mapping (#789)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Since the syncd process running on different platforms will have the different full path names, we
change the full path name of process syncd in the monit config file such that it will be universal and is not for a specific vendor.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [dhcp-relay]: Add DHCP Relay Monitor (#3886)
DHCP relay MONitor (dhcpmon) keeps track of DORA messages. If DHCP Relay
is detected to be not forwarding DORA message, dhcpmon will log such event
to syslog. Under the hood dhcpmon keeps counts of clients DR messages,
forwarded DR messages, DHCP server OA messages, and forwarded OA messages.
dhcpmon will check every 12 sec (configurable) if counts are monotonically
increasing and record snapshot of those counters. dhcpmon will report
discrepancies when detected between current counters and snapshot counters.
pull-request: https://github.com/Azure/sonic-buildimage/pull/3886
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
* Eliminate dependency on libexplain
* Remove dependency on libexplain
* Add a monit config file for teamd container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* Add a copy mechanism to put the monit config file in teamd container
into base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* Add a monit config file for snmp container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* Add a copy mechanism to put the monit config file of snmp container into
the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* Add a monit config file for dhcp_relay container in the dir
base_image_files.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* Add a copy mechanism to put the monit config file of dhcp_relay
container into base image under /etc/monit/conf.d.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* Add a monit config file for router advertiser container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* Add a copy mechanism to put the monit config file of router advertiser
contianer into base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-Pmon] Add a monit config file for pmon container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-Pmon] Add a copy mechanism to put the monit config file into the
base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-lldp] Add a monit config file for lldp container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-lldp] Add a copy mechanism to put the monit config file into the
base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-bgp] Add a monit config file for BGP container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-bgp] Add a copy mechanism to put monit config file into the base
image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-swss] Add a monit config file for the swss container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-swss] Add a copy mechanism to put monit config file into the
base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a monit config file for syncd container on barefoot
platform.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image on barefoot.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a monit config file for syncd container on broadcom.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image on broadcom.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a monit config file for syncd container on cavium.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-centec] Add a monit config file for syncd container on centen
platform.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a monit config file for syncd container on centen
platform.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a monit config file for syncd container on marvell.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a copy mechanism to put the monit conifg file into
the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a monit config file for syncd container on
marvell-arm64.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image on marvell-arm64.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a monit config file for syncd container on
marvell-armhf.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a monit config file for syncd container on mellanox.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a monit config file for syncd container on nephos.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-sflow] Add a monit config file for sflow container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-sflow] Add a copy mechanism to put the monit conifg file into
the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-telemetry] Add a monit config file for telemetry container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-telemetry] Add a copy mechanism to put the monit config file
into the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-database] Add a monit config file for database container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-database] Add a copy mechanism to put the monit config file into
the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-Dhcprelay] Change a typo.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-Dhcprelay] Change the process name in monit config file to
dhcrelay.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] There is no desserve process in syncd container on
barefoot.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] There is no process desserve in syncd container on
cavium.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] There is no process named desserve in syncd on centec.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] There is no process named desserve in syncd on marvell.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Should not delete the process desserve in syncd container
on marvell.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Delete the process dsserve in syncd on marvell.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Delete the process dsserve in syncd container on
marvell-arm64.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Delete the process dsserve in syncd container on
marvell-armhf.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Delete the process dsserve in syncd container on
mellanox.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-Radv] Change the process name to radvd.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-telemetry] Correct a typo in monit_telemetry.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-teamd] Delete the monit config file for teamd.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-teamd] Delete the mechanism to copy the monit config file into
base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-dhcprelay] Delete the monit config file for dhcp_relay
container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-dhcprelay] Delete the mechanism to copy the monit config file
into the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-radv] Delete the monit config file foe radv container.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-radv] Delete the mechanism to copy the monit config file into
the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-bgp] change the monit config file for BGP container such that
monit only generates alert if the process is not running for 5 minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-snmp] Change the monit config file for snmp container such that
monit only generates alret if the process is not running for 5 minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-pmon] Change the monit config file for pmon container such that
monit only generates alert if the processes are not running for 5
minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-lldp] Change the monit config file for lldp container such that
monit only generates alerts if some processes are not running for 5
minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-pmon] Delete the monit config file for pmon container since some
of processes are not running depended on the type of box.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-pmon] Delete the copy mechanism to copy the monit config file
into the base image.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-lldp] Change the matching name for the process lldpd.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-swss] Change the monit config file for swss container such that
monit only generates alerts if the processes are not running for 5
minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Change the monit config file for syncd container on
barefoot such that monit only generates alerts if the process is not
running for 5 minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Correct a typo in monit config file.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Change the monit config file for syncd container on
broadcom such that monit only generates alerts if the processes are not
running for 5 minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Change the monit config file for syncd container on
cavium such that monit only generates alerts if the process is not
running for 5 minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Change the monit config file for syncd container such
that monit only generates alerts if the process is not running for 5
minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Change the monit config file for syncd container on
marvell such that monit only generates alerts if the process is not
running for 5 minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Change the monit config file for syncd container on
marvell-arm64 such that monit only generates alerts if the process is
not running for 5 minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Change the monit config file for syncd container on
marvell-armhf such that monit will generate alert if the process is not
running for 5 minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Change the monit config file for syncd container on
mellanox such that monit only generates alerts if the process is not
running for 5 minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-sycnd] Change the monit config file for syncd container such
that monit only generates alerts if the processes are not running for 5
minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-sflow] Change the monit config file for sflow container such
that monit only generates alerts if the process is not running for 5
minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-telemetry] Change the monit config file for telemetry container
such that monit only generates alerts if the processes are not running
for 5 minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-database] Change the monit config file for database container
such that monit only generates alerts if the process is not running for
5 minutes.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-database] Use 4 spaces to replace 2 spaces in monit config file.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-bgp] Use 4 spcess to replace 2 spaces in monit config file.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-lldp] Use 4 spaces to replace 2 spaces in monit config file.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-swss] Use 4 spaces to replace 2 space in monit config file.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-sflow] Use 4 spaces to replace 2 spaces in monit config file.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-snmp] Use 4 spaces to replace 2 spaces in monit config file.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-telemetry] Use 4 spaces to replace 2 spaces in monit config
file.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file
on barefoot.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file
on broadcom.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file
on cavium.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file
on centec.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file
on marvell.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file
on mellanox.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-syncd] Use 4 spaces to repalce 2 spaces in the monit config file
on nephos.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Docker-bgp] Remove the trailing extra spaces in monit config file.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Monit] Change the monitoring period of monit from 120 seconds to 60
seconds and also at the same time double the interval for existing sonic monit config file in
host.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [201811][monit] address build issue: hard code ARCH to amd64
- also hard code the debian package path as in 201811 branch.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Updates per review comments
1) core_uploader service waits for syslog.service
2) core_uploader service enabled for restart on failure
3) Use mtime instead of file size + ample time to be robust.
* Avoid reloading already uploaded file, by marking the names with a prefix.
* Updated failing path.
1) If rc file is missing or required data missing, it periodically logs error in forever loop.
2) If upload fails, retry every hour with a error log, forever.
* Fix few bugs
* The binary update_json.py will come from sonic-utilities.
* Corefile uploader service
1) A service is added to watch /var/core and upload to Azure storage
2) The service is disabled on boot. One may enable explicitly.
3) The .rc file to be updated with acct credentials and http proxy to use.
4) If service is enabled with no credentials, it would sleep, with periodic log messages
5) For any update in .rc, the service has to be restarted to take effect.
* Remove rw permission for .rc file for group & others.
* Changes per review comments.
Re-ordered .rc file per JSON.dump order.
Added a script to enable partial update of .rc, which HWProxy would use to add acct key.
* Azure storage upload requires python module futures, hence added it to install list.
* Removed trailing spaces.
* A mistake in name corrected.
Copy the .rc updater script to /usr/bin.
- What I did
Added Daemon to Log LPC bus degradation in Intel C2000 processor. Intel Rangeley C2000 processors with revision less than or equal to 2 have issue where LPC bus degrades over time in some processors. To identify the problem and to notify the issue, a daemon has been added which will log on encountering the issue.
- How I did it
Added a daemon which validates the CPLD scratch(0x102) and SMF scratch(0x202) registers by writing and reading values on regular polling intervals (300 seconds). If there is a discrepancy between read and write, a critical log will be thrown.
- How to verify it
The infra is verify by simulating the issue where between write and read, the value in register is modified and the log appearance is checked.
- Description for the changelog
Added Daemon to identify LPC bus degradation issue and notify using syslog in Dell S6100 and Z9100 platforms. This daemon will only run on processors with revision less than or equal to 2.
Rely on platform= and sid= on the command line to detect the platform rather than the eeprom
The platform will now properly initialize even if the system eeprom died or is unreachable.
Add support for the 7260CX3-64E
This is a variant of the 7260CX3-64 with no real difference for software.
If we need to stop swss during fast-reboot procedure on the boot up path,
it means that something went wrong, like syncd/orchagent crashed already,
we are stopping and restarting swss/syncd to re-initialize. In this case,
we should proceed as if it is a cold reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot
1. check whether /proc/cmdline indicates warm/fast reboot.
if yes the software reboot cause file will be treated as the reboot cause.
finish
2. check whether platform api returns a reboot cause.
if yes it is treated as the reboot cause.
finish.
3. check whether /hosts/reboot-cause contains a cause.
if yes it is treated as the cause otherwise return unknown.
* [process-reboot-cause]Fix review comments
* [process-reboot-cause]address comments
1. use "with" statement
2. update fast/warm reboot BOOT_ARG
* [process-reboot-cause]address comments
* refactor the code flow
* Remove escape
* Remove extra ':'
In place editing (sed -i) seems having some issues with filesystem
interaction. It could leave 0 size file or corrupted file behind.
It would be safer to sed the file contents into a new file and switch
new file with the old file.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* lldpctl: put a lock around some commands to avoid race conditions
* Read all notifications in lldpctl_recv
* lib: fix memory leak
* lib: fix memory leak when handling I/O
* Update series
* Add watchdog-control service to disable watchdog during bootup
Disable only if it's applicable and the watchdog is enabled.
* Address the review comment
* Correct the watchdog start script name
* Change to call common watchdog api instead of platform specific
* Start watchdog control service after swss starts
* advance sonic-utility submodule
Patch isc-dhcp-relay in order to allow the relay agent to discover configured interfaces even if they are down.
Without this patch, the relay agent will not discover configured interfaces if they are down when the relay agent starts up. If the interface(s) then get brought up after the relay started, the relay will discard packets received on these interfaces and log the message, Discarding packet received on <iface_name> interface that has no IPv4 address assigned. This led to race conditions when starting SONiC (or loading configuration). To resolve this, the relay agent would need to be restarted with all configured interfaces up.
With this patch, the relay agent will discover all configured interfaces, whether or not they are up at the time the relay agent starts. Thus, the state of the configured interfaces can be down when the relay agent starts and brought up during the lifetime of the relay agent process, and the relay agent will relay packets as expected; it will not discard them.
* Added debug symbol to dhcp-relay.
Note: Master is different; Hence explicitly for 201811 only.
* Include debug symbols of isc-dhcp in its debug docker.
Include isc-dhcp src in source archive.
This change is intended to fix the issue with dpkg-query during build
process.
The symptom is dpkg-query failed to open package info file, usually
/var/lib/dpkg/updates/000?
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Revert "Configure buffer profile to all ports (#3561)" (#3628)
Configure buffer profile to all ports (#3561)
This reverts commit 8861cbe98e.
Signed-off-by: Wenda Ni <wenni@microsoft.com>
Submodule src/sonic-utilities 2ca1ae1..3ed25a4:
> Do not start pfcwd for M0 devices (#726)
> Make configlet application script idempotent for updates. (#728)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* BGPm for 201811 (#3601)
* Feature is downported
* Add monitors to the test minigraphs
* Test
* No pfx filer
* Fix bgp sample
* Quagga requires to activate peer-group before configuration
* Add bgpcfgd and bgpd.peer template
* Catch exception if rendering external template
* Fix tests
* Update minigraph.py to filter out front-panel ports that are not active
* Update cfggen tests to reflect new behavior
Signed-off-by: Danny Allen <daall@microsoft.com>
* Incorporate PR comments
- Update t0 tests to include additional device neighbors
- Refactor xml parsing logic
Submodule src/sonic-swss f09ddb4..49c9c16:
> Allow buffer profile apply after init (#1099)
> [aclorch]: Check for existing mirror table only when creating a new table (#1089)
> [201811] [portsorch] fix PortsOrch::allPortsReady() returns true when it should not (#1116)
> Address review comment: remove data member m_entriesCreated, which is introduced for dependancy resolution purpose. (#839)
> Fix PFC watchdog not getting lossless TC (#876)
Submodule src/sonic-utilities c049e54..2ca1ae1:
> Add a generic configlet application script (#716)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
interfaces-config service configures lo address. If bgp service
starts before lo address is configured, then following config
in zebra will not be applied.
route-map RM_SET_SRC permit 10
set src 10.1.0.32
The adds a few seconds delay in bgp service start
Submodule src/sonic-swss 98cfe56..f09ddb4:
> [fix] Use the same storm detection condition for queue occupancy non-zero case as the zero case (#1111)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
We noticed in tests/production that there is a low probability failure
where /etc/hosts could have some garbage characters before the entry for
local host name. The consequence is that all sudo command would be very
slow. In extreme cases it would prevent some services from starting
properly.
I suspect that the /etc/hosts file might be opened by some process causing
the issue. Editing contents with new file level and replace the whole file
should be safer.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- What I did
This fix removes the possibility of 'localhost' entry getting removed from /etc/hosts file by hostname-config service.
Without this change, whenever we change the hostname from 'localhost' to any other name on the config_db.json and reload the config, /etc/hosts file will only have the new hostname on it. But there are multiple sonic utilities (eg: swssconfig) which relies on the hard coded 'localhost' name and they tend to stop working.
- How I did it
Added a new check on hostname-config.sh script to avid blindly deleting the line containing the old hostname from /etc/hosts file. Now it will delete the old hostname only if its not localhost or when the hostname is not changing.
- How to verify it
Bring up SONiC on a device with hostname as localhost
Edit /etc/sonic/config_db.json to update the 'hostname' filed under DEVICE_METADATA from "hostname" : "localhost" --> "hostname" : "sonic"
run config reload -y to reflect the hostname change done on config_db.json file.
cat /etc/hosts and check whether both 127.0.0.1 localhost and 127.0.0.1 sonic entry are present on the file.
ping localhost should work fine.
- Description for the changelog
Make hostname-config service more robust in handling SONiC hostname change from localhost to anything else.
Submodule src/sonic-utilities fb5902f..d315dd7:
> show pfcwd status to be 'N/A' when pfcwd is stopped (#682)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Feature is downported
* Add monitors to the test minigraphs
* Test
* No pfx filer
* Fix bgp sample
* Quagga requires to activate peer-group before configuration
- after reloading minigraph, write latest version string in the DB.
- if old config_db.json file exists, use it and migrate to latest version.
- only reload minigraph when config_db.json doesn't exist and minigraph
exists.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Submodule src/sonic-utilities 54946e9..5b1fa3c:
> [neighbor_advertiser] Verify that DIPs returned from ferret are not in device VLAN (#670)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Submodule src/sonic-swss 0e5c1ff..fcd091c:
> [mirrororch]: Remove mirror session state after it is remvoed (#1066)
Submodule src/sonic-utilities a89b9d4..54946e9:
> [acl_loader]: Add monitor port column in show mirror_session output (#662)
> [warm/fast reboot] some service docker might have been stopped already (#668)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Remove the divide by 4 operation to the under the hood SAI
This is to avoid the need and thus the confusion for application program to know
the mmu internal architecture
This change must have support from SAI change to reach the correct
config
Signed-off-by: Wenda <wenni@microsoft.com>
* Relegate the divide by 4 operation to the under the hood SAI for egress
lossless pool
Extend to 7060 and 6100
Signed-off-by: Wenda <wenni@microsoft.com>
* Add more TH/TH2 hwskus
Signed-off-by: Wenda Ni <wenni@microsoft.com>
* Update config test
Signed-off-by: Wenda Ni <wenni@microsoft.com>
* Add TH2 ingress lossy profile
Signed-off-by: Wenda Ni <wenni@microsoft.com>
* Move the divide by 4 operation to SAI internal
Signed-off-by: Wenda Ni <wenni@microsoft.com>
* [bcm SAI] Upgrade Broadcom SAI to version 3.5.3.1-15
- Broadcom SAI 3.5 GA release 20190924.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* add config.bcm for hlx
* modify config.bcm path for hlx
* Delete hx4-cel-hbtn-48x1G+4x10G.config.bcm
* add config.bcm and path
* update led for cxp
* Add new device data for dx010
* Add debug docker for SNMP.
* Removed a redundant install of debug packages.
Propagate the debug flag to template file to mount /dbg & /src to debug containers.
* Revert the last change to retain the original
Submodule src/sonic-sairedis 1cf2eea..55ec4d2:
> [syncd]: support query port with 8 lanes (400G)
Submodule src/sonic-swss 2974844..24fcbb6:
> support 8 lanes for a physical port (#778)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Submodule src/sonic-utilities f76fc2c..fe2c656:
> [warm-reboot]: Do not clean up mirror session state database (#639)
> [config] Reset failed status of all SONiC services, whether or not they are currently failed (#619)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [cron.d] Create cron job to periodically clean-up core files
* Create script to scan /var/core and clean-up older core files
* Create cron job to run clean-up script
Signed-off-by: Danny Allen <daall@microsoft.com>
* Update interval for running cron job
* Respond to feedback
* Change syslog id
- monit config broke by one monit upgrade
- abandon sed approach since it is suspestible to monit config changes
- use unixsocket instead of httpd due to a bug in 5.20.0
Submodule src/sonic-sairedis 4ee82cb..1cf2eea:
> Add acl counter match logic based on acl entry field (#511)
> Add specific comparison logic for ACL counter (#484)
Submodule src/sonic-swss 46bc1f4..660530e:
> Fix VLAN error introduced with new 4.9 kernel behavior (#1001)
> Warmboot Vlan neigh restore fix (#1040)
Submodule src/sonic-utilities 11b4cf1..f76fc2c:
> [warm reboot] Skip ASIC config pre-check if current image does not support it (#637)
> [FastReboot]: Send SIGINT to all teamd before stop (#633)
> [warm/fast reboot] provide strict option to prevent warm reboot under certain conditions (#631)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* advance pointer for sonic-platform-daemon to 9e2eb29e6e55a116b424faa63f510c7cdeeef7eb
[xcvrd] fix issue: xcvrd fails due to syntax error after sequential reboot (#43)
* advance pointer for sonic-platform-common to ac7fde6e9ce532d450b3c43f354fc2f128053b4f
[sonic_sfp] fix syntax error in sfputilbase._read_eeprom_specific_bytes (#58)
This piece of information is currently not used. Revert this
pull request in the future to add back the default mirror
session information into the configuration database.
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
[build_debian] Generate checksum of ASIC config files
* Adds script to generate checksums for ASIC config files
* Adds step to build_debian that copies ASIC config checksum into SONiC filesystem
Signed-off-by: Danny Allen daall@microsoft.com
Submodule src/sonic-utilities 4f72e14..11b4cf1:
> [fast-reboot] Check if ASIC config has changed before warm reboot (#621)
> [neighbor_advertiser]: Change the ICMPv6 type to 135 (#629)
> [acl_loader]: Fix show mirror_session error (#580)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
There is a very low possibility that qsfp_status exists when opening it but doesn't when reading it a bit later. If this occurs an exception is raised.
However, this should be treated as "not presence" rather than causing an exception, which can be achieved by putting it into try block.
Fixed errors in the following files to resolve build failures
- docker-ptf-nephos.mk\docker-syncd-nephos.mk\libsaithrift-dev.mk\rules.mk
- Upgrade sai.mk for support sai_1.4.1 and upgrade sdk version to 3.0.0
radv should be left alone during warm restart of swss. Otherwise it will
announce departure and cause hosts to lose default gateway.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
ndisc6 gathers a few diagnostic tools for IPv6 networks including:
- ndisc6, which performs ICMPv6 Neighbor Discovery in userland,
- rdisc6, which performs ICMPv6 Router Discovery in userland,
- rltraceroute6, a UDP/ICMP IPv6 implementation of traceroute,
- tcptraceroute6, a TCP/IPv6-based traceroute implementation,
- tcpspray6, a TCP/IP Discard/Echo bandwidth meter,
- addrinfo, easy script interface for hostname and address resolution,
- dnssort, DNS sorting script.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
- Improve smbus reliability for all platforms
- Delay processing of the reboot cause to reduce critical path
- Add support of get_change_event for PSUs
* fix sfpd initialize issue
* fix review comments
* rephrase the output log
* fix retry counter
* change the retry time to 10, means set max waiting time 1024s
* fix mlnx-sfpd init flow with new solution
* [mlnx-sfpd] address comments
1. wait for 5 seconds * 30 times, 150 seconds totally. use constant wait time for each retry.
2. use try/except structure so that error can be handled in a graceful way
* [mlnx-sfpd] wait 5 seconds after SDK_DAEMON_READY_FILE exists to make sure SDK is fully up.
* [mlnx-sfpd]simplify initialization by using deinitialize on initializing failure
Present: Servers are listed in the same order as in redis-db
Fix: Save the sort o/p, hence use sorted list to write into pam.d's conf.
As well convert priority to integer for use by sort.
Submodule src/sonic-sairedis 54c8e78..992cdc0:
> Do not store invalid OIDs from FDB notification into ASIC DB (#503)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [service dependent] describe non-warm-reboot dependency outside systemctl
When dependency was described with systemctl, it will kick in all the time,
including under warm reboot/restart scenarios. This is not what we always
want. For components that are capable of warm reboot/start, they need to
describe dependency in service files.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [service] teamd service should not require swss service
Adding require swss will cause teamd to be killed by systemctl when swss
stops. This is not what we want in warm reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* refactoring code
* rename functions to match other functions in the file
when device disk is small, do not unzip dockerfs.tar.gz on disk.
keep the tar file on the disk, unzip to tmpfs in the initrd phase.
enabled this for 7050-qx32
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* [Mellanox/mlnx-platform-api] enable mellanox's platform-api to be loaded as a whole.
* [chassis] update reboot-cause handling code to adapt the hw-management currently running on 201811
* [chassis]handle the case that reboot cause file can be any dir matching pattern "hwmonX".
The race condition could happen like this:
When an interface is enslaved into the port channel immediately after
it is created, the order of creating the ifinfo and linking the ifinfo to
the port is not guaranteed.
Please check the patch commit message to get full details.
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
* [warm reboot] save configuration after warm reboot
After warm reboot, save a copy of in memory database to config_db.json,
upgrade procedure might have removed config_db.json to force new image
to reload minigraph. However, reload minigraph is skipped during warm
reboot. Missing config_db.json would cause device to fault in next
non-upgrading cold/fast reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Update finalize-warmboot.sh
* backport new platform api to 201811, reboot cause part
* install new platform api on host
* 1. remove chassis's dependency on sonic_platform_daemon.
2. add some mellanox-specific hardware reboot causes.
3. fix typo in files/image_config/process-reboot-cause/process-reboot-cause.
* 1. add dependency of sonic_platform for base image
2. handle the case of reboot cause file not found
* adjust log message.
* [201811][sairedis][swss] advance sub modules head
Submodule src/sonic-sairedis 18ad5f9..4c75b7f:
> Fixed conditional operator. (#487)
Submodule src/sonic-swss 1e99c93..cd12d48:
> [teamsyncd]: Add information for LAG membership changes (#982)
> Fix vlan incremental config and add vs test cases (#799)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [swss] include more swss changes
Submodule src/sonic-swss cd12d48..f44029d:
> [MirrorOrch]: Init the next hop ip with 0 instead of default constructor (#953)
> [AclOrch]: Fix the acl mirror counter doubled by inactive mirror and active again (#952)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Subscribe to both ConfigDB and AppDB
to get notifications to apply LLDP port config
* the operstate file is not consistent
Removing this since it is not serving any purpose
* Remove check for PortInitDone and PortConfigDone
This is not prteset in Config DB
* Remove checking State DB for port creation
* Check for key to be present before fetching it
* Addressing review comments
Integrating official Mellanox SDK/FW release as a pre condition for getting new Mellanox SAI release with hash changes (inner field) and 3k VXLAN scale. As well as bug fix for Spectrum LP mode.
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
In case of going from previous iteration of SONiC, and the last reboot
was hardware, REBOOT_CAUSE_FILE may not be present and the service may
throw an error.
* fix name conflict between sonic_platfrom package and sonic_platform.py
* update sonic-utility submodule to pickup lastest fix
* Revert "update sonic-utility submodule to pickup lastest fix"
This reverts commit f66aa99738.
* update sonic-utility sub module
- Make sure that migrated DB contents persisted for next boot
- Make sure that db saved after warm reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [dell/Z9100] Fix for optics not detected in fanout mode
This commit fixes the issue of optics not detected
error while running sfputil show eeprom command. The root
casuse was the value of port index from port_config.ini for
fan out scenario. The port index should be starting from 0
and not 1. Platform cpld registers are assuming the port
numbers to start from 0 (lowermost bit), sfputils.py uses this
port number in get_presence function. Since the indexing passed
is wrong the optics was not detected and gave SFP EEPROM not
detected message.
Signed-off-by: Harish Venkatraman <Harish_Venkatraman@Dell.com>
* [dell/z9100] Fix for optics not detected in fanout mode
This commit fixes the issue of optics not detected error
while running sfputil show eeprom command. The root cause
was wrong port_index in fan out scenarios. Earlier fix of
changing the port_config.ini is reverted and changes made
in z9100 platform specific sfputil.py file. The port number
is decrement and tested for both 100G and 50G fanout cases.
Tested for the following show commands and test was succesful
show interfaces status, show interfaces transceiver eeprom,
show interfaces transceiver lpmode, show interface tranceiver
presence.
Signed-off-by: Harish Venkatraman <Harish_Venkatraman@Dell.com>
Submodule src/sonic-platform-common 42119e1..5d7954e:
> [ChassisBase] Make reboot cause constant strings human-readable (#35)
> Add .gitignore file (#28)
> [sonic_platform_base] Add sonic_sfp and sonic_eeprom to sonic_platform_base (#27)
> Enhance new platform API (#19)
> fix typo in platform API base class (#25)
Submodule src/sonic-swss 9cf7b01..1e99c93:
> Set timer only when interval changes. Not in each firing of the timer. (#945)
Submodule src/sonic-utilities ec1e93f..24958f1:
> [fast reboot] stop removing opennsl module before reboot (#560)
Submodule src/sonic-swss-common b472f6e..d6140fa:
> timerfd:read failure - Record in logs as error. (#295)
> do not abort when read timerfd return 0 and errno = 0 (#291)
> Add an assert to logger, which will log a message and abort. (#286)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* fix the memory leak in on_pmpe. objects created via sx_api having convention new_<type_name> should be release explicitly via delete_<type_name>.
* avoid duplicate code.
* fix fast reboot compatibility
We should handle both cases for backward-compatible with 201803:
- fast-reboot
- SONIC_BOOT_TYPE=fast-reboot
* handle review comments
* add a comment that getBootType code snippet is shared between two files
- What I did
Configure sshd to close all SSH connetions after 15 minutes of inactivity.
- How I did it
Set ClientAliveInterval to 900 (900 seconds = 15 minutes) and ClientAliveCountMax to 0
in /etc/ssh/sshd_config using augtool in build_debian.sh. In the process, I refactored the existing augtool command for sshd_config so as to add comments and empty lines to file for readability.
- How to verify it
Log into device via management port. Wait 15 minutes without sending a keystroke -- you should be automatically logged out.
2) Install debug tools in every debug docker image
3) Install available debug symbols in debug docker image
4) Provide additional host/docker mapping for host dirs /src & /debug
4.1) The one-image will have source code under /src
4.2) /debug is mapped as rw. User can put his core file there and use this dir to
collect debug session logs too.
5) Build debug image using debug dockers
6) Source code is archived into /src of debug image
7) The welcome banner is extended to display these additional facilities in debug image.
Upgrade the version of rsyslog installed in Docker containers to the latest version available from jessie-backports repo (currently 8.23.0-2~bpo). Based off my change to the 201803 branch (#2709). This should eliminate some memory leaks and will prevent any regressions if moving from the 201803 branch to the 201811 branch.
* [submodule] update sonic-linux-kernel (#2985)
* Fix many version strings
* Update minor version
* Update arista-drivers submodule (#9)
* Rebuild SDK on new kernel (#10)
* Set the default mac ageing time to 300 seconds
The current mac ageing was disabled, this could lead the mac address
table to increase over time and lead to resource and performance issues.
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* Update the default HW ageing timer to be 600 seconds.
This is to be on the safer side where ARP update interval
is 300 seconds and SONiC does not flood when ARP is aged out.
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
- Broadcom SAI GA version 20190513
- Broadcom fix for CS7999193, CS7913246, CS4529162, CS8180755, CS8242625
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- What I did
Currently when the system is under memory pressure, the OOM killer kicks in and kills a rogue process. Killing a rogue process can cause the device to be un-healthy leading to blackholing of the traffic.
To avoid this, configure the OOM to do a kernel panic which will cause the device to reboot and come back up healthy.
- How I did it
Added the sysctl variable panic_on_oom and set the value to 2.
Setting it to 2 will ensure OOM killer to always do a kernel panic.
Submodule src/sonic-utilities 6b4d1a0..46b5aa8:
> [show ip interface] Add support for 'alias' interface naming mode (#486)
Submodule src/sonic-swss 9c4ae18..a637562:
> Suppress storm detect counter increment for ongoing pfc storm case during a warm reboot (#869)
> Remove *_LEFT fields to allow PFC watchdog to enter fresh into the (#897)
> Set LAG mtu value based on kernel netlink msg (#922)
> [warm restart assist] assume vector values could be reordered (#921)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Submodule src/sonic-utilities 7a2348c..4488525:
> use vlan members (#542)
> [sonic_installer] If asked to install an image which is already installed, simply set as default (#534)
Submodule src/sonic-swss 8246bd9..9c4ae18:
> Ignore neighbor entry with BCAST MAC, check SAI status exists (#914)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [logrotate] Decrease frequency to every 10 minutes; kill any lingering logrotate processes
* [logrotate] Delete all *.1.gz files as firstaction; Remove note about init-system-helpers < 1.47 workaround
However, continue to send SIGHUP directly to rsyslogd process
because 'service rsyslog rotate' still doesn't work properly with
init-system-helpers version 1.48
* [mlnx] fix mlnx-sfpd shutdown
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
* fix type and handle only EINTR and EAGAIN errors from select
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
* handle select.error as well during init/run
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
* Switch the nss look up order as "compat" followed by "tacplus".
This helps use the legacy passwd file for user info and go to tacacs only if not found.
This means, we never contact tacacs for local users like "admin".
This isolates local users from any issues with tacacs servers.
W/o this fix, the sudo commands by local users could take <count of servers> * <tacacs timeout> seconds, if the tacacs servers are unreachable.
* Skip tacacs server access for local non-tacacs users.
Revert the order of 'compat tacplus' to original 'tacplus compat' as tacplus
access is required for all tacacs users, who also get created locally.
* Change URL for isc-dhcp source repository
* Modify supervisor conf to generate dhcrelay commands with '-id' and '-iu' options
* Comments; Also clean up jinja2 syntax
* Patch relay to open one socket per interface and send to all servers on all upstream interfaces
* Patch relay agent to properly forward BOOTREQUEST only on appropriate interface if it is a directed broadcast
* Port upstream patches to isc-dhcp-relay to support upstream/downstream interfaces
* Update patch to properly support interfaces with multiple IP addresses assigned
* Pass --enable-use-sockets to configure instead of uncommenting USE_SOCKETS directly
Submodule src/sonic-utilities 6130695..a1f961c:
> update scheme variable name (#531)
> [teamshow]: Add * to indicate if the state has been synced into database (#395)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Submodule src/sonic-swss e26e1d8..8246bd9:
> [watermarkorch] only perform periodic clear if the polling is on (#781)
Submodule src/sonic-utilities e3bb8b9..6130695:
> [reboot] log reboot progress and add a sanity check before reboot (#526)
> Fix TODO to get/set active ports only (#494)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- Add ebtables package, and install some filter rules:
1. ebtables -A FORWARD -d BGA -j DROP
2. ebtables -A FORWARD -p ARP -j DROP
Basically, we let the ARP packets in the VLAN being forwarded by the ASIC,
kernel gets a copy of these ARP packets and the forwarding from Kenerl gets
dropped. So there is always only one copy of ARP/response in the VLAN.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* DellEMC S6000, xcvrd support
* sleep 1 second to avoid busy looping
* removal of dead code
* Correct typo error to 1 second
* Introduced 1 second sleep
* Revamped script with blocking call support
* get_transceiver_change_event api definition update
* adding timeout support for get_transceiver_change_event
Port libteam patch which fixes the race condition we observed during
warm reboot.
Remove early patches: 0006, 0008, 0009.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
SWSS clears DB tables, if teamd is not started after swss, there is a
race condition that swss might clear vital teamd information.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Submodule src/sonic-sairedis 74f0f44..d027eae:
> [SAI header] upgrade SAI header to version v1.3.7 (#445)
Submodule src/sonic-utilities 0f7e75c..9005508:
> Bring queue storm status to 'pfcwd show stats' (#500)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Submodule src/sonic-swss ae74a27..6e8f991:
> Create ingress table group during the PFCWD stats list installment (#815)
Submodule src/sonic-utilities 6ba6d27..0f7e75c:
> If fast-reboot-dump gives an error, don't continue with fast-reboot (#515)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Backport of
54f137c105
According to 6.4.15 of IEEE 802.1AX-2014, Figure 6-22, the state that the
port is selected moves MUX state from DETACHED to ATTACHED.
But ATTACHED state does not mean that the port can send and receive user
frames. COLLECTING_DISTRIBUTION state is the state that the port can send
and receive user frames. To move MUX state from ATTACHED to
COLLECTING_DISTRIBUTION, the partner state should be sync as well as the
port selected.
In function lacp_port_actor_update(), only INFO_STATE_SYNCHRONIZATION
should be set to the actor.state when the port is selected.
INFO_STATE_COLLECTING and INFO_STATE_DISTRIBUTING should be set to false
with ATTACHED mode and set to true when INFO_STATE_SYNCHRONIZATION of
partner.state is set.
In function lacp_port_should_be_{enabled, disabled}(), we also need to
check the INFO_STATE_SYNCHRONIZATION bit of partner.state.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
* Add new device CIG CS6436-56P
* Delete minigraph.xml
It isn't necessary in the current system, just delete it
* Update qos.json.j2
* Update port_config.ini
Add the speed column. The cmd to show interface status as:
root@switch1:~# show interface status
Interface Lanes Speed MTU Alias Oper Admin Type Asym PFC
----------- --------------- ------- ----- ------------ ------ ------- ------ ----------
Ethernet0 8 25G 9100 Ethernet1/1 up up SFP N/A
Ethernet1 9 25G 9100 Ethernet2/1 up up SFP N/A
Ethernet2 10 25G 9100 Ethernet3/1 down down N/A N/A
Ethernet3 11 25G 9100 Ethernet4/1 down down N/A N/A
Ethernet4 12 25G 9100 Ethernet5/1 down down N/A N/A
Ethernet5 13 25G 9100 Ethernet6/1 down down N/A N/A
Ethernet6 14 25G 9100 Ethernet7/1 down down N/A N/A
Ethernet7 15 25G 9100 Ethernet8/1 down down N/A N/A
Ethernet8 16 25G 9100 Ethernet9/1 down down N/A N/A
Ethernet9 17 25G 9100 Ethernet10/1 down down N/A N/A
Ethernet10 18 25G 9100 Ethernet11/1 down down N/A N/A
Ethernet11 19 25G 9100 Ethernet12/1 down down N/A N/A
Ethernet12 20 25G 9100 Ethernet13/1 down down N/A N/A
Ethernet13 21 25G 9100 Ethernet14/1 down down N/A N/A
Ethernet14 22 25G 9100 Ethernet15/1 down down N/A N/A
Ethernet15 23 25G 9100 Ethernet16/1 down down N/A N/A
Ethernet16 32 25G 9100 Ethernet17/1 down down N/A N/A
Ethernet17 33 25G 9100 Ethernet18/1 down down N/A N/A
Ethernet18 34 25G 9100 Ethernet19/1 down down N/A N/A
Ethernet19 35 25G 9100 Ethernet20/1 down down N/A N/A
Ethernet20 40 25G 9100 Ethernet21/1 down down N/A N/A
Ethernet21 41 25G 9100 Ethernet22/1 down down N/A N/A
Ethernet22 42 25G 9100 Ethernet23/1 down down N/A N/A
Ethernet23 43 25G 9100 Ethernet24/1 down down N/A N/A
Ethernet24 48 25G 9100 Ethernet25/1 down down N/A N/A
Ethernet25 49 25G 9100 Ethernet26/1 down down N/A N/A
Ethernet26 50 25G 9100 Ethernet27/1 down down N/A N/A
Ethernet27 51 25G 9100 Ethernet28/1 down down N/A N/A
Ethernet28 56 25G 9100 Ethernet29/1 down down N/A N/A
Ethernet29 57 25G 9100 Ethernet30/1 down down N/A N/A
Ethernet30 58 25G 9100 Ethernet31/1 down down N/A N/A
Ethernet31 59 25G 9100 Ethernet32/1 down down N/A N/A
Ethernet32 64 25G 9100 Ethernet33/1 down down N/A N/A
Ethernet33 65 25G 9100 Ethernet34/1 down down N/A N/A
Ethernet34 66 25G 9100 Ethernet35/1 down down N/A N/A
Ethernet35 67 25G 9100 Ethernet36/1 down down N/A N/A
Ethernet36 68 25G 9100 Ethernet37/1 down down N/A N/A
Ethernet37 69 25G 9100 Ethernet38/1 down down N/A N/A
Ethernet38 70 25G 9100 Ethernet39/1 down down N/A N/A
Ethernet39 71 25G 9100 Ethernet40/1 down down N/A N/A
Ethernet40 72 25G 9100 Ethernet41/1 down down N/A N/A
Ethernet41 73 25G 9100 Ethernet42/1 down down N/A N/A
Ethernet42 74 25G 9100 Ethernet43/1 down down N/A N/A
Ethernet43 75 25G 9100 Ethernet44/1 down down N/A N/A
Ethernet44 76 25G 9100 Ethernet45/1 down down N/A N/A
Ethernet45 77 25G 9100 Ethernet46/1 down down N/A N/A
Ethernet46 78 25G 9100 Ethernet47/1 down down N/A N/A
Ethernet47 79 25G 9100 Ethernet48/1 down down N/A N/A
Ethernet48 84,85,86,87 100G 9100 Ethernet49/1 up up QSFP28 N/A
Ethernet49 80,81,82,83 100G 9100 Ethernet50/1 up up QSFP28 N/A
Ethernet50 92,93,94,95 100G 9100 Ethernet51/1 down down N/A N/A
Ethernet51 88,89,90,91 100G 9100 Ethernet52/1 down down N/A N/A
Ethernet52 108,109,110,111 100G 9100 Ethernet53/1 down down N/A N/A
Ethernet53 104,105,106,107 100G 9100 Ethernet54/1 down down N/A N/A
Ethernet54 116,117,118,119 100G 9100 Ethernet55/1 down down N/A N/A
Ethernet55 112,113,114,115 100G 9100 Ethernet56/1 down down N/A N/A
root@switch1:~#
Submodule src/sonic-utilities 6aee909..79a0185:
> [fast/warm reboot] add some sanity check before warm reboot (#510)
> In sync with our latest change, where we default failthrough to be False. (#507)
> [generate_dump] system dump improvements (#503)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
There are some platforms with less powerful CPU/hard-drive could take
longer to get ready for BGP. For these platforms, 240 seconds would be
a safer threshold.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Submodule src/sonic-sairedis 483c89e..97dd2a8:
> Fix compilation issues in stretch docker with gcc-6.3 (#426)
> Make object list deterministic when iterating (#438)
> Ignore ACL_COUNTER bytes and packets during comparison logic (#443)
Submodule src/sonic-swss d22b2de..ae74a27:
> Survive pfc watchdog storm action across warm-reboot (#794)
Submodule src/sonic-swss-common 36fd5e9..24c0ff7:
> Update PFC_WD table name in CONFIG_DB (#266)
Submodule src/sonic-utilities bae21e7..6aee909:
> [neighbor advertiser] convert int to string before concatenating (#505)
> [config]: Change the order of interface commands (#504)
> Change PFC watchdog CONFIG_DB table name from PFC_WD_TABLE to PFC_WD (#475)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
When adding a lag member dynamically after system boots up, teamd
port priv change handler could re-entrant itself and causing adding
operation to fail.
While handling PORT_CHANGE event, teamd_per_port.c port priv change
handler was called, it will then call runner_lacp to add port to lag,
the later causes IFINFO_CHANGE to be notified and calls the priv change
handler again, this re-entrance would cause runner_lacp port_added to
be called again and messes up with the previous adding sequence. Then
fails the lag member adding operation.
Prevent per port priv change handler re-entrance solves the problem.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [vm build] force Linux to drop cache before calling kvm
KVM need to allocate 2G memory for this build. The system memory might
be occupied by cache at the moment and doesn't have 2G chunk to give
out. Forcing Kernel to drop cache to boost the chance of getting 2G
memory.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [make] add option to enable/disable VS build memory preparation
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
After warm reboot is done, we need to disable warm reboot flag and
tear down anything setup for warm reboot and persisted across.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [snmpagent][swss-common] advance sub module head to include overlay SNMP
Submodule src/sonic-snmpagent 0f2bbd7..26f0495:
> Remove verbose feature missing logs (#102)
> Enable overriding interface counters OIDs (#98)
Submodule src/sonic-swss-common 5f4abd9..36fd5e9:
> Add new DB index for SNMP_OVERLAY_DB (#262)
Note: overlay DB also requires change in swss-common, which has been
moved ahead.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [utilities] advance utilities sub-module head
Submodule src/sonic-utilities 9d9aaa0..d1070b2:
> [warm-reboot] initialize warm reboot state table before warm rebooting (#492)
> Allow config shutdown and startup operations on valid PortChannel interface names (#474)
Race condition has been noticed after warm reboot: sometimes when
port_changed notification was received, the link message didn't
have the device name. Without device name, creating team port
would fail.
Registering to the interface information change notification, so
later when device name becomes available, retry creating team port.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Submodule src/sonic-utilities f95da07..2fe01fe:
> neighbor advertiser script (#469)
> [aclshow] restore PRIO column and sort entries by priority (#476)
> Update watermark default polling interval to 10s (#470)
> show interface status <interface-name> throws error (fixes#427) (#440)
Submodule src/sonic-swss 90eb25d..91171b6:
> fix a unstable swss egress acl test (#776)
> [aclorch] Remove L4 port range support limitation on egress ACL table and add new SWSS virtual test. (#741)
> Fix orchagent SEGV when PortConfigDone not set (#803)
Submodule src/sonic-swss-common 2592b0c..5f4abd9:
> Force only supported commands on consumer table (#261)
> Add multiple fields hdel support (#267)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Fixes "No ISSU version file found /etc/mlnx/issu-version"
when rebooting to different image;
Add aditional check condition.
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
Lossy traffic does not need to be mapped to different ingress PGs. They can all share the same ingress PG.
Signed-off-by: Wenda Ni <wenni@microsoft.com>
PFC_WD table naming change is not ready to be included yet.
Submodule src/sonic-swss-common c674e64..2592b0c (rewind):
< Add multiple fields hdel support (#267)
< Update PFC_WD table name in CONFIG_DB (#266)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Submodule src/sonic-sairedis 54b02a1..8182916:
> Add pre match to comparison logic and unittests (#423)
> Drop FDB notifications if they contain invalid OIDs (#428)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
start() is called by service startPre method, which is blocking. Starting
syncd service here is causing deadlock.
attach() is called by service start method, which is non-blocking.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [services] Ensure swss and syncd services start before dependent services
* Add 'attach' functions to scripts which get installed to /usr/local/bin so that services only reference the one script each
* Add 'After=swss.service' to syncd.service
need to flush asic db in swss.sh instead of syncd.sh
orchagent might already started in swss.sh and put commands
into asic db before asic db is flushed in syncd.sh. This
causes race condition such as INIT_VIEW not passing to syncd.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* Add a log message for each notification of add/del TACACS server.
Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
* Moved another syslog message from DEBUG to INFO to be able to see those notifications.
All these changes are to help with a one-time-seen-bug, that hostcfgd did not act upon changes to redis for TACACS servers. We could not repro the bug.
Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
* fe60afa 2019-02-12 | [lldpsyncd] remove dbsyncd logic which is moved to lldpmgr (#15) (HEAD, origin/master, origin/HEAD) [Mykola F]
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* [minigraph.py] generate mandatory default port description
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
* use port name as default description
* [config-engine] update test exaple output
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
* [minigraph.py] use alias/port name as default description instead of neighbor data
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
- [config/main.py]Fixed - added a validation such that delete portchannel or portchannel members only when it is configured (#277) (#445)
- Revert "[config/main.py]Fixed - added a validation such that delete portchannel or portchannel members only when it is configured (#277) (#445)" (#452)
- [intfutil] add Asymmetric PFC status to 'show interface status' (#437)
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
* [submodule 201811] advance sairedis and swss submodule for 201811 branch
Submodule src/sonic-sairedis 60f97c3..bdb0074:
> Add more information on failed map sizes (#416)
> Add WRED specific comparison logic (#413)
> Initialize notification queue pointer before switch create (#411)
> Add log info for not matching SG/IPG/QUEUES (#409)
> Add support for any number of ports in virtual switch using lane map (#407)
Submodule src/sonic-swss 85f6322..65a0256:
> Increase the watermark polling interval to 10s (#777)
> [vstest]: fix test_port_an_warm.py test (#779)
> [wartermarkorch] Fix repeated m_pg_ids and m_unicast_queue_ids add up issue (#752)
> [teammgrd] Fix inconsistent port admin status (#755)
> Remove AclTableGroup upon removal of port/lag/vlan (#751)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [submodule] advance sonic-swss submodule
Submodule src/sonic-swss 65a0256..bf21ee3:
> On a routing vlan, the neighbor entry in the /31 subnet is not added … (#784)
> portsorch ports init done flag should means buffer, autoneg, speed, m… (#747) (#783)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [updategraph] After system upgrade, restore files/directories with
original attributes etc.
Restore a few more files that was missed before.
Restore FRR configuration directory if exists on old system
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* Removed deployment_id_asn_map.yml from copy list
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* QoS config change: 1) DSCP mapping; 2) link pg/queue 6 to lossy buffer;
3) redistribute scheduler
Signed-off-by: Wenda <wenni@microsoft.com>
* Add scheduling weight to queue 2
Signed-off-by: Wenda <wenni@microsoft.com>
* Link pg/queue 2 to lossy buffer
Signed-off-by: Wenda <wenni@microsoft.com>
* Update the pg headroom for a7060-D48C8 50G
Signed-off-by: Wenda <wenni@microsoft.com>
* Update config gen test for qos
Signed-off-by: Wenda <wenni@microsoft.com>
* Update pg headroom size, and update egress lossy pool size accordingly
Signed-off-by: Wenda <wenni@microsoft.com>
* Update headroom pool size; Update ingress service pool and egress lossy
pool sizes accordingly;
Signed-off-by: Wenda <wenni@microsoft.com>
* a7260: update headroom pool size; Update ingress service pool and egress lossy pool sizes accordingly;
Signed-off-by: Wenda <wenni@microsoft.com>
* Update config gen test for buffer
Signed-off-by: Wenda <wenni@microsoft.com>
* [reboot cause] Move reboot-cause files to /host directory so they persist across SONiC upgrades
* [sonic-utilities] Update submodule to include related changes
The sai profile itself can support 32x50G+16x100G/40G while
the initial port_config.ini uses 40G speeds for port 17-32.
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
To include following changes:
- [teammgrd]: Add retry logic for starting port channel with teamd (#756)
- [portsorch] fix bug in initializePort (#753)
- [intfmgrd] Fix intfmgrd hanging untill first interface becomes ready (#748)
- [intfmgrd]: Support loopback (#742)
-Improve comments for neighbor warmrestart related functions and warmRestartAssist class (#740)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
When rebooting without the platform_reboot plugin, systemd takes a few
minutes to properly shutdown. It's blocking on some docker cleanup
operation.
As described by https://github.com/docker/for-linux/issues/421 there
is a race between docker.service and containerd.service.
docker needs containerd to properly stop the containers.
The race condition could happen like this:
When an interface is enslaved into the port channel immediately after
it is created, the order of creating the ifinfo and linking the ifinfo to
the port is not guaranteed.
Please check the patch commit message to get full details.
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
dell_ich module fails to load sometimes due to the failure of pci_get_drvdata().
This function is responsible for fetching INTEL PCI related memory handle in kernel. This is implemented in lpc_ich kernel module.
Due to race in addition/deletion of kernel modules, sometimes lpc_ich loads after dell_ich.
Because of this behaviour dell_ich module fails to load.
Fixed by addding dependency between modules.
Removed i2c_mux_gpio module from blacklist entry as it is not the original root case of this issue.
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.