[Magma] - Opening bucket during recovery takes a long time

Description

Script to Repro

Steps to Repro
1. Create a 5 node cluster
2021-07-13 15:05:12,207 | test | INFO | pool-3-thread-6 | [table_view:display:72] Rebalance Overview
------------------------------------------------------------------------------

Nodes

Services

Version

CPU

Status

------------------------------------------------------------------------------

172.23.98.196

kv

7.1.0-1066-enterprise

1.73779708663

Cluster node

172.23.98.195

None

 

 

<--- IN ---

172.23.120.206

None

 

 

<--- IN ---

172.23.104.186

None

 

 

<--- IN ---

172.23.120.201

None

 

 

<--- IN ---

------------------------------------------------------------------------------

2. Create bucket/scopes/collections/data
2021-07-13 15:10:04,253 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
---------------------------------------------------------------------------------------------------------------------+---------------------------------------------

Bucket

Type

Storage Backend

Replicas

Durability

TTL

Items

RAM Quota

RAM Used

Disk Used

---------------------------------------------------------------------------------------------------------------------+---------------------------------------------

r4dK9wYeZBZ%iqABnPUVxh6g5r2Sj4CAH8N89or9vhAsV3Sv3AK-22-897000

couchbase

magma

2

none

0

3000000

10485760000

2407566136

1992997674

---------------------------------------------------------------------------------------------------------------------+---------------------------------------------

3. Do a graceful failover/delta recovery/rebalance of node 172.23.104.186.

4. Do a graceful failover/delta recovery/rebalance of another node 172.23.120.201.

This rebalance fails as shown below.

Some interesting things to note.
1. This is fairly easily reproducible but not on every single run. At least 3/10 times we can hit this from 10-15 runs I have had with each run having 15-20 tests.
2. This is only seen on the second graceful failover/delta recovery/rebalance(step 4) and when it does on first graceful failover/delta recovery/rebalance(step 3) it fails like .

cbcollect_info attached.

Components

Affects versions

Fix versions

Labels

Environment

7.1.0-1066-enterprise

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Attachments

3

Activity

Show:

Steve Watanabe October 18, 2021 at 10:06 PM

 Is there a way for ns_server to determine, via Stats, the bucket is ready to have dcp connections created successfully?

Ben Huddleston October 18, 2021 at 9:57 PM

It is indeed . https://github.com/couchbase/kv_engine/blob/master/engines/ep/src/warmup.h#L112-L219 Gives a good description of the various phases of warmup. After the PopulateVBucketMap phase vBucket stats should be retrievable. However, it's not until the Done state that Dcp Consumers are createable. The bulk of warmp (certainly for couchstore buckets) is going to be in some of the data loading phases done after PopulateVBucketMap. The changes that I mentioned earlier didn't change this though. They changed it so that after we enable traffic (mutations etc.) we also have to wait for all warmup threads to finish before we accept a Dcp Consumer (to prevent a race condition).

(Side note - comment on isFinishedLoading() function allowing creation of DcpConsumers is out of data, it should be isComplete() now - will update).

Steve Watanabe October 18, 2021 at 9:45 PM

Node .105 determines the state of "default" by obtaining stats (command 0x10) from memcached on node .111. We do this to "poll" the state of the bucket on the other node. At some point memcached returns the state of each vbucket in the stats. e.g. here's some debug logs.

and ns_considers this to mean the bucket is ready. Is it possible memcached is returning the vbucket stats when it's not completed all it's warm up activities (which means it'll return Etmpfail for dcp connections)?

Steve Watanabe October 15, 2021 at 9:39 PM
Edited

Looking at the debug.log for node .111 we see the bucket loaded and we start a dcp connection which errors out due to (per Ben) the bucket not being entirely warmed.

And then the bucket does get marked as warmed

So it seems as if we're prematurely trying to establish the dcp connection. On node .105 we see the bucket has become ready and so dcp connections will be set up (and fail). So it appears node .105's determination of the bucket being ready is wrong.

Ben Huddleston October 15, 2021 at 7:39 AM

/ http://review.couchbase.org/c/kv_engine/+/162170 Feels like a likely cause for this.

Looking at the logs picked out, "Engine warmup is complete" is a bit of a rubbish log message, it gets logged per shard I think rather than per bucket and with the changes for we now require all of the warmup tasks to actually finish for the bucket to allow a DCP consumer to be created. We actually want to look at the last occurence of that message (we then see a DCPConsumer successfully open after that).

hopefully this is enough info for you to go on. When exactly is ns_server expecting to be able to open a DCPConsumer?

Steve Watanabe October 14, 2021 at 7:36 PM

This appears to be a different cause. Looking at node 107.105 the janitor_agent has died while attempting to establish a dcp connection to node 107.111 which fails with a etmpfail

Looking at the memcached.log on node 107.111 I don't see what may be leading to the error (with my untrained memcached-eye).

Assigning this ticket to kv for further triage. Feel free to return the ticket to me afterwards.

Steve Watanabe September 20, 2021 at 9:48 PM

Based on updates above from who asked "We will address this post Neo. Please let us know if that is OK." and the response from "yes post Neo is fine. " I'm changing the target to Morpheus.

Meni Hillel September 18, 2021 at 4:40 AM

For Neo, it seems that is the best we can do - increase the timeout. What would be an acceptable timeout under extreme condition? More importantly - we'll probably should document what one may need to do if timeout hits. Is there a way to estimate the duration of a given request?

Shivani Gupta September 17, 2021 at 6:19 PM

Ok that's good. So from end-user perspective it is not bucket creation that can take long and time-out. Which is good because bucket creation will be frequent in slower developer type VMs who are just trying it out.

I suggest we change the title of the ticket to 'Opening a Magma bucket during recovery takes a long time'. This issue is about opening an existing bucket, not about creating a new bucket.

Sarath Lakshman September 17, 2021 at 6:15 PM
Edited

Bucket creation should not take time. It will take time only in the case of the bucket with existing data. For an empty bucket, it won't create even a single sstable until the data is inserted. ns_server/kv team can clarify cases other than delta recovery.

Shivani Gupta September 17, 2021 at 6:09 PM
Edited

 so can the creation of a new bucket also take this long? Actually I am not sure why the ticket header is 'Decrease creation time of bucket'?

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

No

Triage

Triaged

Operating System

Centos 64-bit

Issue Impact

external

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created July 14, 2021 at 7:55 AM
Updated 4 days ago
Resolved February 10, 2022 at 3:12 PM
Instabug