Indexer services not able to recover from a KV node going down and rebalance

Description

1. Create a cluster of 4 nodes (1:kv+index,2:kv,3:index,4:index)
2. Create default bucket and load 100800 documents
3. Stop node -2 which hosts the kv
4. Create a secondary index
4. from UI do a graceful failover of the node and do a rebalance.

Verify the following:
a) Query status, expecting it should be online, since rebalance has happened:
Query is not online:
Ritams-MacBook-Pro:testrunner rsharma$ curl -u Administrator:password 172.23.106.75:8093/query -d 'statement=select * from system:indexes'
{
"requestID": "b7363d27-14a5-4f04-95e1-6d27394f691f",
"signature": {
"": ""
},
"results": [
],
"errors": [
{
"code": 12014,
"msg": "error: Error Connecting KV 127.0.0.1:8091 Err 172.23.106.76:11210: dial tcp 172.23.106.76:11210: connection refused. Index employee73679e114d774875a0214b2dcd81ccd5job_title([`job_title`]). Index state: pending"
}
],
"status": "errors",
"metrics": {
"elapsedTime": "126.296725ms",
"executionTime": "126.016899ms",
"resultCount": 0,
"resultSize": 0,
"errorCount": 1
}
}

b) UI has multiple issues:
1. Shows up 2 indexes instead of one - known issue
2. Shows error.

Components

Affects versions

Fix versions

Labels

Environment

Build - 4.0.0-2093

Link to Log File, atop/blg, CBCollectInfo, Core dump

https://s3.amazonaws.com/bugdb/jira/MB-14953/collectinfo-2015-05-14T143729-ns_1@172.23.106.75.zip https://s3.amazonaws.com/bugdb/jira/MB-14953/collectinfo-2015-05-14T143729-ns_1@172.23.106.77.zip https://s3.amazonaws.com/bugdb/jira/MB-14953/collectinfo-2015-05-14T143729-ns_1@172.23.106.78.zip

Release Notes Description

None

Activity

Show:

John Liang June 24, 2015 at 5:19 PM

merge to master

John Liang June 17, 2015 at 6:08 PM

Fixed. Ready to be merged to unstable branch.

Cihan Biyikoglu May 19, 2015 at 10:42 PM

given the workaround, I'll move out of beta1.

John Liang May 19, 2015 at 10:39 PM

This happens in a narrow window when the indexer is initializing the data structures when rebalancing happens. In this case, indexer will set an error to the index. User can drop and then recreate the index. If rebalancing happens passed the initialization point, then indexer will take care of recovery (there are tests for it).

So workaround is to drop the index.

Cihan Biyikoglu May 19, 2015 at 10:34 PM

doors are closing so moving this out. if we can fix within the next 24 hours, we can take this for beta.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

No

Triage

Untriaged

Operating System

Centos 64-bit

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created May 13, 2015 at 10:01 AM
Updated July 24, 2015 at 1:02 PM
Resolved June 24, 2015 at 5:19 PM
Instabug

Flag notifications