[frr]: lib: Fix corruption when routemap delete/add sequence happens (#16456)
Why I did it Zebra core sometimes seen during config reload. Series of route-map deletions and then re-adds, and this triggers the hash table to realloc to grow to a larger size, then subsuquent route-map operations will be against a corrupted hash table. Issue is seen when we have BFD Enable on Static Route table we see Static route-map being created/deleted based on bfd session state. However issue itself is very generic from FRR perspective. Thie issue has detailed core info sonic-net/sonic-frr#37 . This PR fixes this issue. Fixes#sonic-net/sonic-frr#37 Work item tracking Microsoft ADO (17952227): How I did it This fix is already in Master frr/8.2.5. Porting this fix to 202205 branch to address this Zebra core. sonic-net/sonic-frr@5f503e5 Solution: The whole purpose of the delay of deletion and the storage of the route-map is to allow the using protocol the ability to process the route-map at a later time while still retaining the route-map name( for more efficient reprocessing ). The problem exists because we are keeping multiple copies of deletion events that are indistinguishable from each other causing hash havoc. How to verify it Verified running sonic-mgmt test, doing multiple config reloads.
This commit is contained in:
parent
2b302e83c0
commit
dbfa8f9660
@ -0,0 +1,136 @@
|
||||
From 5f503e5f5aecc946a168c87f3e02757deb65cbeb Mon Sep 17 00:00:00 2001
|
||||
From: Donald Sharp <sharpd@nvidia.com>
|
||||
Date: Wed, 2 Mar 2022 15:41:54 -0500
|
||||
Subject: [PATCH] lib: Fix corruption when routemap delete/add sequence happens
|
||||
|
||||
If a operator issues a series of route-map deletions and
|
||||
then re-adds, *and* this triggers the hash table to realloc
|
||||
to grow to a larger size, then subsuquent route-map operations
|
||||
will be against a corrupted hash table.
|
||||
|
||||
Why?
|
||||
|
||||
Effectively the route-map code was inserting each
|
||||
route-map <NAME> into a hash for storage. Upon
|
||||
deletion there is this concept of delayed processing
|
||||
so the routemap code sets a bit `to-be-processed`
|
||||
and marks the route-map for deletion. This is
|
||||
1 entry in the hash table. Then if the operator
|
||||
recreates the hash, FRR would add another hash
|
||||
entry. If another deletion happens then there
|
||||
now are 2 deletion entries that are indistinguishable
|
||||
from a hash perspective.
|
||||
|
||||
FRR stores the deleted name of the route-map so that
|
||||
any delayed processing can lookup the name and only process
|
||||
those peers that are related to that route-map name.
|
||||
This is good as that if in say BGP, we do not want
|
||||
to reprocess all the peers that don't use the route-map.
|
||||
|
||||
Solution:
|
||||
The whole purpose of the delay of deletion and the
|
||||
storage of the route-map is to allow the using protocol
|
||||
the ability to process the route-map at a later time
|
||||
while still retaining the route-map name( for more efficient
|
||||
reprocessing ). The problem exists because we are keeping
|
||||
multiple copies of deletion events that are indistinguishable
|
||||
from each other causing hash havoc.
|
||||
|
||||
The truth is that we only need to keep 1 copy of the
|
||||
routemap in the table. If the series of events is:
|
||||
a) delete ( schedule processing )
|
||||
b) add ( reschedule processing )
|
||||
|
||||
Current code ends up processing the route-map two times
|
||||
and in this event we really just need to reprocess everything
|
||||
with the new route-map.
|
||||
|
||||
If the series of events is:
|
||||
a) delete (schedule processing )
|
||||
b) add (reschedule)
|
||||
c) delete (reschedule)
|
||||
d) add (reschedule)
|
||||
|
||||
All this really points to is that FRR just needs to keep the last
|
||||
in the series of maps and ensuring that FRR knows that we need
|
||||
to continue processing the route-map. So in the creation processing
|
||||
if the hash has an entry for this map, the routemap code knows that
|
||||
this is a deletion event. Mark this route-map for later processing
|
||||
if it was marked so. Also in the lookup function do not return
|
||||
a map if the map found was deleted.
|
||||
|
||||
Fixes: #10708
|
||||
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
||||
|
||||
---
|
||||
lib/routemap.c | 25 ++++++++++++++-----------
|
||||
1 file changed, 14 insertions(+), 11 deletions(-)
|
||||
|
||||
|
||||
diff --git a/lib/routemap.c b/lib/routemap.c
|
||||
index 7f733c811..8f343ccd9 100644
|
||||
--- a/lib/routemap.c
|
||||
+++ b/lib/routemap.c
|
||||
@@ -101,6 +101,7 @@ static void route_map_del_plist_entries(afi_t afi,
|
||||
|
||||
static struct hash *route_map_get_dep_hash(route_map_event_t event);
|
||||
|
||||
+static void route_map_free_map(struct route_map *map);
|
||||
struct route_map_match_set_hooks rmap_match_set_hook;
|
||||
|
||||
/* match interface */
|
||||
@@ -566,15 +567,8 @@ static bool route_map_hash_cmp(const void *p1, const void *p2)
|
||||
const struct route_map *map1 = p1;
|
||||
const struct route_map *map2 = p2;
|
||||
|
||||
- if (map1->deleted == map2->deleted) {
|
||||
- if (map1->name && map2->name) {
|
||||
- if (!strcmp(map1->name, map2->name)) {
|
||||
- return true;
|
||||
- }
|
||||
- } else if (!map1->name && !map2->name) {
|
||||
- return true;
|
||||
- }
|
||||
- }
|
||||
+ if (!strcmp(map1->name, map2->name))
|
||||
+ return true;
|
||||
|
||||
return false;
|
||||
}
|
||||
@@ -636,13 +630,18 @@ static struct route_map *route_map_new(const char *name)
|
||||
/* Add new name to route_map. */
|
||||
static struct route_map *route_map_add(const char *name)
|
||||
{
|
||||
- struct route_map *map;
|
||||
+ struct route_map *map, *exist;
|
||||
struct route_map_list *list;
|
||||
|
||||
map = route_map_new(name);
|
||||
list = &route_map_master;
|
||||
|
||||
/* Add map to the hash */
|
||||
+ exist = hash_release(route_map_master_hash, map);
|
||||
+ if (exist) {
|
||||
+ map->to_be_processed = exist->to_be_processed;
|
||||
+ route_map_free_map(exist);
|
||||
+ }
|
||||
hash_get(route_map_master_hash, map, hash_alloc_intern);
|
||||
|
||||
/* Add new entry to the head of the list to match how it is added in the
|
||||
@@ -752,11 +751,15 @@ struct route_map *route_map_lookup_by_name(const char *name)
|
||||
if (!name)
|
||||
return NULL;
|
||||
|
||||
- // map.deleted is 0 via memset
|
||||
+ // map.deleted is false via memset
|
||||
memset(&tmp_map, 0, sizeof(struct route_map));
|
||||
tmp_map.name = XSTRDUP(MTYPE_ROUTE_MAP_NAME, name);
|
||||
map = hash_lookup(route_map_master_hash, &tmp_map);
|
||||
XFREE(MTYPE_ROUTE_MAP_NAME, tmp_map.name);
|
||||
+
|
||||
+ if (map && map->deleted)
|
||||
+ return NULL;
|
||||
+
|
||||
return map;
|
||||
}
|
||||
|
@ -15,3 +15,4 @@ Disable-ipv6-src-address-test-in-pceplib.patch
|
||||
0028-bgpd-Ensure-that-bgp-open-message-stream-has-enough-data-to-read.patch
|
||||
0029-bgpd-Change-log-level-for-graceful-restart-events.patch
|
||||
0030-zebra-remove-duplicated-nexthops-when-sending-fpm-msg.patch
|
||||
0031-lib-Fix-corruption-when-routemap-delete-add-sequence.patch
|
||||
|
Reference in New Issue
Block a user