VxRail クラスタ起動時のタイムアウト値について詳細調査

イントロ

VxRail は、VxRail クラスタShutdown機能を利用してクラスタをシャットダウンすると次回起動時に自動的にメンバーノードのメンテナンスモードが解除され、vSANが再度有効化されたのちにSystemVMが自動で起動するようになっている。
起動時に上記の動作を提供するスクリプトが仕込まれており、それは以下のスクリプトファイルである。

/etc/rc.local.d/998.start_vm.py

このスクリプトには上記の動作を安定化させるためにいくつかの待機フェーズが仕込まれている。この記事では、スクリプト内で利用される各待機フェーズにおけるタイムアウト値を調査した

免責

本記事内容は、7.0.320 で稼働するノードの該当スクリプトを解析しています。
調査結果についてはベストエフォートであり、必ずしもその正確性や網羅性を保証するものではありません。

スクリプトの動作を確認

スクリプト内で以下の部分がメイン関数であるとわかる

546 def main():
547 parser = argparse.ArgumentParser()
548 parser.add_argument('-d', dest='debug', help='Debug mode (no run in background)', action='store_true')
549
550 args = parser.parse_args()
551 if not args.debug:
552 daemonize()
553 do_start_vms()

do_start_vms という関数が呼び出されている

422 def do_start_vms():
423 """
424 Wait for VSAN ready.. only wait if VSAN has MARVIN_CONFIGURED_VSAN_PREFIX prefix
425 This waits two things.
426 a) the node status to become HEALTHY
427 b) all listed system VMs to become connected (VM home namespace is ready) and
428 all the DOM owners are part of the cluster
429 """
430 utils.waitHostdReady() <<<<<<<<<<<< 自分自身がReadyかどうか（Primary & Secondary 共通）

do_start_vms 関数内のコメントで２つの待機フェーズが示唆されているが実際はこれよりも多くのタイミングで待機が必要となっている。

直後の utils.waitHostdReady() が最初の待機フェーズとなるが、この関数は別のスクリプトで定義されている
33 sys.path.append("/usr/lib/vmware/")
36 from marvin import utils # pylint: disable=F0401
上記の記述から、 /usr/lib/vmware/ 配下に marvin というディレクトリがあり、さらにその下に utils.py or .pyc があることが予想できる
実際に /usr/lib/vmware/marvin/utils.py が存在した

該当関数の定義部分を見ると以下のようになっており、 hostd が Read になるまで300秒待機することが分かる

55 def waitHostdReady(timeout=300): <<<< hostd サービスが ready になるまでのWaitが300秒
56 expiry = time.time() + timeout
57 while expiry > time.time():
58 try:
59 c = connect.Connect(user='vpxuser')
60 connect.Disconnect(c)
61 return True
62 except Exception:
63 pass
64 time.sleep(5)
65 raise Exception("Hostd is not ready after timeout.")

元のスクリプト（/etc/rc.local.d/998.start_vm.py）に戻って、以下の部分で次の待機フェーズが確認できる

446 # skip the operations for EMRS cluster
447 if conn.is_vsan_enabled():
448 #If primary node
449 filepath = fetch_file_path()
450 log.info('Target path is {}'.format(filepath))
451
452 if not timesync.encrypt_decrypt_file(log, filepath, 'd'):
453 log.error('Fail to decrypt reboot_host file')
454 return
455
456 if os.path.isfile(filepath): <<<<<<<<<<<< Primary Node かどうかを判別
457 with open(filepath, 'r') as contents:
458 lines = [line.rstrip('\n') for line in contents]
459 params = [VsanTaskParam(line) for line in lines]
460
461 # Wait until all hosts are available
462 verify_host_availability(params) <<<<<<<<<<<<<<< Primary Node 以外のノードとの待ち合わせ（Primary Node のみ）
463 log.info('All hosts are available')

詳細は省くが、/etc/rc.local.d/998.start_vm.py は Primary Nodeとそれ以外で動作が異なるように設計されている。特定のファイル有無で自身がPrimary Nodeかどうかを判別しており、Primary Node は、Secondary Node の起動待ちの後で reboot_helper recover や system VM の起動を行う役割となる。
上記ではPrimary Node が verify_host_availability にて Secondary Host の起動を待っているのが分かる

326 def verify_host_availability (params) :
327 log.info('Wait for all hosts available')
328
329 size = len(params) <<<<<<<<<<<<< param (Secondary Nodeの情報) 配列の長さ＝ノード数を取得
330 log.info('verify_host_availability size is {}'.format(size))
331
332 while True: <<<<<<<<<<<<<< Node数(size)がParamに一致しない限り無限ループ
333 success = 0
334 hosts_conn = None
335 for param in params:
336 try:
337 hosts_conn = connect.Connect(host=param.host_addr, user=param.username, pwd=param.password, version=VIM_VERSION)
338 hosts_conn._stub.ComputeVersionInfo(VIM_VERSION)
339 _root = hosts_conn.content.rootFolder
340 host_obj = _root.childEntity[0].hostFolder.childEntity[0].host[0]
341 log.info('host name {} is connected'.format(host_obj.name))
342 if not host_obj.runtime.inMaintenanceMode:
343 success += 1
344 except Exception as e: <<<<<<<<<<<< 途中で例外エラーが発生してもここでキャッチしているので何事もなくWhile文のループが継続
345 log.info('Connection failed due to {}'.format(e))
346 finally:
347 if hosts_conn is not None:
348 connect.Disconnect(hosts_conn)
349 if success == size: <<<<<<<<<<<<<< Node数(size)がParamに一致しない限り無限ループ
350 break

このフェーズではタイムアウトの存在が確認できず、無限ループであると考えられる。
その後、以下の個所で箇所で reboot_helper recover を実行している。5分間のタイムアウトが定義され、その間は成功するまで実行を繰り返す動作として定義されている

476 expired_time = datetime.timedelta(minutes=5) <<<<<<<<<< reboot_helper のタイムアウトは5分
477 current_time = datetime.datetime.now()
478 while True:
479 if datetime.datetime.now() - current_time > expired_time: <<<<<<<<< While文で5分間のリトライし続ける。
480 log.info('Failed to invoke reboot_help script')
481 return
482 log.info('call run_reboot_helper to recover')
483 result = run_reboot_helper()
484 log.info('Result of invoke reboot_helper script {}'.format(result))
485 if result:
486 break

その後、自分自身のvSAN Health を確認するコメントのとおり、この部分にもタイムアウトは存在しない

493 wait_vsan_host_health(conn) # wait forever

107 def wait_vsan_host_health(conn):
108 vsan_system = conn.host.configManager.vsanSystem
109 def __check_host_health():
110 status = vsan_system.QueryHostStatus()
111 return status.health == vim.VsanHostHealthState.healthy
112 __wait(__check_host_health)
113

vSAN Healthが健全となったら、最後にSystem VMを起動する。
System VMを起動するためには、関連のvSAN Objectが Accessibleにならないければならないため、その待機が必要となる

500 # then wait for VM name spaces
501 system_vms = get_system_vm_list(conn, config.get('systemVM', []))
502 if system_vms:
503 wait_system_vm_ready_to_power_on(conn, system_vms)
504 # now we should be able to start them all
505 start_system_vms(conn, system_vms, config.get('systemVM', []))
506

133 def wait_system_vm_ready_to_power_on(conn, vms):
134 ready_sys_vms = set()
135 def __check_vms_status():
136 if len(ready_sys_vms) >= len(vms):
137 return True
138 for vm in vms:
139 if vm.config is None:
140 log.warn('VM config is invalid.')
141 continue
142 if vm.config.instanceUuid in ready_sys_vms:
143 continue
144 # PR1908837:
145 # We found that even the VM is fully functional.
146 # The SDK dosen't reflect the status.
147 # So let's just ensure we have the namespace OK.
148 # further reduce the VM starting condition to only we have VM namespace.
149 if vm.summary.runtime.connectionState == vim.VirtualMachineConnectionState.connected:
150 log.info('The VM of "{0}" is ready to power on'.format(vm.config.instanceUuid))
151 ready_sys_vms.add(vm.config.instanceUuid)
152 return False
153 return __wait(__check_vms_status, interval=5) <<<<<<<<<<<< __wait関数を利用している

93 def __wait(check_func, timeout=3600, interval=10): <<<< デフォルトで3600秒でタイムアウト
94 """
95 @param check_func: the callback to check if the wait condition has met
96 @param timeout: the time in seconds to wait for. None means forever
97 @param interval: the interval to check status
98 """
99 expiry = None if timeout is None else time.time() + timeout
100 while expiry is None or time.time() < expiry:
101 if check_func():
102 return
103 time.sleep(interval)
104 raise TimeoutException('Timeout')

関連する行が多いが結局のところ __wait 関数が待機フェーズ動作を担っており、デフォルトで3600秒のタイムアウトが設定されていることが分かる。

その後の start_system_vms においても System VM 起動時のリトライ上限としてタイムアウトが設定されているが、本記事では説明を省く
補足だが、最近のVxRail Versionでは、System VM はすべてPrimary Node に集められるため、System VM を起動するのは実質的にPrimary Node の役割となるが、過去のVersionにおいては必ずしもそうではなくSecondary NodeでSystem VMが起動することもあった。