Indexer services not able to recover from a KV node going down and rebalance

Description

1. Create a cluster of 4 nodes (1:kv+index,2:kv,3:index,4:index)
2. Create default bucket and load 100800 documents
3. Stop node -2 which hosts the kv
4. Create a secondary index
4. from UI do a graceful failover of the node and do a rebalance.

Verify the following:
a) Query status, expecting it should be online, since rebalance has happened:
Query is not online:
Ritams-MacBook-Pro:testrunner rsharma$ curl -u Administrator:password 172.23.106.75:8093/query -d 'statement=select * from system:indexes'
{
"requestID": "b7363d27-14a5-4f04-95e1-6d27394f691f",
"signature": {
"": ""
},
"results": [
],
"errors": [
{
"code": 12014,
"msg": "error: Error Connecting KV 127.0.0.1:8091 Err 172.23.106.76:11210: dial tcp 172.23.106.76:11210: connection refused. Index employee73679e114d774875a0214b2dcd81ccd5job_title([`job_title`]). Index state: pending"
}
],
"status": "errors",
"metrics": {
"elapsedTime": "126.296725ms",
"executionTime": "126.016899ms",
"resultCount": 0,
"resultSize": 0,
"errorCount": 1
}
}

b) UI has multiple issues:
1. Shows up 2 indexes instead of one - known issue
2. Shows error.

Labels

Environment

Build - 4.0.0-2093

Link to Log File, atop/blg, CBCollectInfo, Core dump

https://s3.amazonaws.com/bugdb/jira/MB-14953/collectinfo-2015-05-14T143729-ns_1@172.23.106.75.zip https://s3.amazonaws.com/bugdb/jira/MB-14953/collectinfo-2015-05-14T143729-ns_1@172.23.106.77.zip https://s3.amazonaws.com/bugdb/jira/MB-14953/collectinfo-2015-05-14T143729-ns_1@172.23.106.78.zip

Release Notes Description

None

Activity

Show:

John Liang June 24, 2015 at 5:19 PM

merge to master

John Liang June 17, 2015 at 6:08 PM

Fixed. Ready to be merged to unstable branch.

Cihan Biyikoglu May 19, 2015 at 10:42 PM

given the workaround, I'll move out of beta1.

John Liang May 19, 2015 at 10:39 PM

This happens in a narrow window when the indexer is initializing the data structures when rebalancing happens. In this case, indexer will set an error to the index. User can drop and then recreate the index. If rebalancing happens passed the initialization point, then indexer will take care of recovery (there are tests for it).

So workaround is to drop the index.

Cihan Biyikoglu May 19, 2015 at 10:34 PM

doors are closing so moving this out. if we can fix within the next 24 hours, we can take this for beta.

Fixed

Pinned fields

Click on the next to a field label to start pinning.

Details
Assignee
John Liang
Reporter
Ritam Sharma(Deactivated)
Is this a Regression?
No
Triage
Untriaged
Operating System
Centos 64-bit
Priority
Critical
Instabug
Open Instabug

PagerDuty

Sentry

Zendesk Support

Created May 13, 2015 at 10:01 AM

Updated July 24, 2015 at 1:02 PM

Resolved June 24, 2015 at 5:19 PM

Instabug

Indexer services not able to recover from a KV node going down and rebalance

Description

Components

Affects versions

Fix versions

Labels

Environment

Link to Log File, atop/blg, CBCollectInfo, Core dump

Release Notes Description

Activity

John Liang June 24, 2015 at 5:19 PM

John Liang June 17, 2015 at 6:08 PM

Cihan Biyikoglu May 19, 2015 at 10:42 PM

John Liang May 19, 2015 at 10:39 PM

Cihan Biyikoglu May 19, 2015 at 10:34 PM

Details
Assignee
John Liang
Reporter
Ritam Sharma(Deactivated)
Is this a Regression?
No
Triage
Untriaged
Operating System
Centos 64-bit
Priority
Critical
Instabug
Open Instabug

Details

Assignee

Reporter

Is this a Regression?

Triage

Operating System

Priority

Instabug

PagerDuty

PagerDuty

Sentry

Sentry

Zendesk Support

Zendesk Support

Flag notifications

Something's gone wrong

Indexer services not able to recover from a KV node going down and rebalance

Description

Components

Affects versions

Fix versions

Labels

Environment

Link to Log File, atop/blg, CBCollectInfo, Core dump

Release Notes Description

Activity

John Liang June 24, 2015 at 5:19 PM

John Liang June 17, 2015 at 6:08 PM

Cihan Biyikoglu May 19, 2015 at 10:42 PM

John Liang May 19, 2015 at 10:39 PM

Cihan Biyikoglu May 19, 2015 at 10:34 PM

DetailsAssigneeJohn LiangJohn LiangReporterRitam SharmaRitam Sharma(Deactivated)Is this a Regression?NoTriageUntriagedOperating SystemCentos 64-bitPriorityCriticalInstabugOpen Instabug

Details

Assignee

Reporter

Is this a Regression?

Triage

Operating System

Priority

Instabug

PagerDutyPagerDuty Incident

PagerDuty

Sentry Linked Issues

Sentry

Zendesk SupportLinked Tickets

Zendesk Support

Flag notifications

Something's gone wrong

Details
Assignee
John Liang
Reporter
Ritam Sharma(Deactivated)
Is this a Regression?
No
Triage
Untriaged
Operating System
Centos 64-bit
Priority
Critical
Instabug
Open Instabug

PagerDuty

Sentry

Zendesk Support