Changwei Ge 1c01967116 ocfs2: fix cluster hang after a node dies
When a node dies, other live nodes have to choose a new master for an
existed lock resource mastered by the dead node.

As for ocfs2/dlm implementation, this is done by function -
dlm_move_lockres_to_recovery_list which marks those lock rsources as
DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM
changes lock resource's master later.

So without invoking dlm_move_lockres_to_recovery_list, no master will be
choosed after dlm recovery accomplishment since no lock resource can be
found through ::resource list.

What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock
resources mastered a dead node, it will break up synchronization among
nodes.

So invoke dlm_move_lockres_to_recovery_list again.

Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")'
Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-EX.srv.huawei-3com.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Reported-by: Vitaly Mayatskih <v.mayatskih@gmail.com>
Tested-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:01 -08:00
..
2017-09-06 17:27:24 -07:00
2017-09-06 17:27:24 -07:00
2017-09-06 17:27:24 -07:00
2017-02-27 18:43:46 -08:00
2017-09-06 17:27:24 -07:00
2011-07-25 14:58:15 -07:00
2017-07-06 16:24:30 -07:00
2017-09-06 17:27:24 -07:00
2015-12-29 17:45:49 -08:00
2017-09-06 17:27:24 -07:00
2016-12-12 18:55:06 -08:00
2017-07-06 16:24:30 -07:00
2017-09-06 17:27:24 -07:00
2017-09-06 17:27:24 -07:00
2011-03-31 11:26:23 -03:00
2017-09-06 17:27:24 -07:00
2012-05-29 23:28:40 -04:00
2017-09-06 17:27:24 -07:00
2016-05-12 15:52:50 -07:00