Administrator Manual - Bright Computing

thingsplaneΔιακομιστές

9 Δεκ 2013 (πριν από 3 χρόνια και 8 μήνες)

1.730 εμφανίσεις

Bright Cluster Manager 6.1
Administrator Manual
Revision:4724
Date:Thu,05 Dec 2013
©2013 Bright Computing,Inc.All Rights Reserved.This manual or parts
thereof may not be reproduced in any formunless permitted by contract
or by written permission of Bright Computing,Inc.
Trademarks
Linux is a registered trademark of Linus Torvalds.PathScale is a regis-
tered trademark of Cray,Inc.Red Hat and all Red Hat-based trademarks
are trademarks or registered trademarks of Red Hat,Inc.SUSE is a reg-
istered trademark of Novell,Inc.PGI is a registered trademark of The
Portland Group Compiler Technology,STMicroelectronics,Inc.SGE is a
trademark of Sun Microsystems,Inc.FLEXlm is a registered trademark
of Globetrotter Software,Inc.Maui Cluster Scheduler is a trademark of
Adaptive Computing,Inc.ScaleMP is a registered trademark of ScaleMP,
Inc.All other trademarks are the property of their respective owners.
Rights and Restrictions
All statements,specifications,recommendations,and technical informa-
tion contained herein are current or planned as of the date of publication
of this document.They are reliable as of the time of this writing and are
presented without warranty of any kind,expressed or implied.Bright
Computing,Inc.shall not be liable for technical or editorial errors or
omissions which may occur in this document.Bright Computing,Inc.
shall not be liable for any damages resulting from the use of this docu-
ment.
Limitation of Liability and Damages Pertaining to
Bright Computing,Inc.
The Bright Cluster Manager product principally consists of free software
that is licensed by the Linux authors free of charge.Bright Computing,
Inc.shall have no liability nor will Bright Computing,Inc.provide any
warranty for the Bright Cluster Manager to the extent that is permitted
by law.Unless confirmed in writing,the Linux authors and/or third par-
ties provide the programas is without any warranty,either expressed or
implied,including,but not limited to,marketability or suitability for a
specific purpose.The user of the Bright Cluster Manager product shall
accept the full risk for the quality or performance of the product.Should
the product malfunction,the costs for repair,service,or correction will be
borne by the user of the Bright Cluster Manager product.No copyright
owner or third party who has modified or distributed the program as
permitted in this license shall be held liable for damages,including gen-
eral or specific damages,damages caused by side effects or consequential
damages,resulting fromthe use of the programor the un-usability of the
program(including,but not limited to,loss of data,incorrect processing
of data,losses that must be borne by you or others,or the inability of the
program to work together with any other program),even if a copyright
owner or third party had been advised about the possibility of such dam-
ages unless such copyright owner or third party has signed a writing to
the contrary.
Table of Contents
Table of Contents...........................i
0.1 Quickstart............................xvii
0.2 About This Manual.......................xvii
0.3 Getting Administrator-Level Support............xvii
1 Introduction 1
1.1 What Is Bright Cluster Manager?...............1
1.2 Cluster Structure........................1
1.3 Bright Cluster Manager Administrator And User Environ-
ment...............................3
1.4 Organization Of This Manual.................3
2 Installing Bright Cluster Manager 5
2.1 Minimal Hardware Requirements...............5
2.1.1 Head Node.......................5
2.1.2 Compute Nodes....................5
2.2 Supported Hardware......................6
2.2.1 Compute Nodes....................6
2.2.2 Ethernet Switches....................6
2.2.3 Power Distribution Units...............6
2.2.4 Management Controllers...............6
2.2.5 InfiniBand........................6
2.3 Head Node Installation:Bare Metal Method.........7
2.3.1 Welcome Screen.....................7
2.3.2 Software License....................7
2.3.3 Kernel Modules Configuration............9
2.3.4 Hardware Overview..................10
2.3.5 Nodes Configuration..................10
2.3.6 Network Topology...................11
2.3.7 Additional Network Configuration.........13
2.3.8 Networks Configuration................15
2.3.9 Nameservers And Search Domains..........16
2.3.10 Network Interfaces Configuration..........16
2.3.11 Select Subnet Managers................18
2.3.12 Select CD/DVDROM.................19
2.3.13 Workload Management Configuration........20
2.3.14 Disk Partitioning And Layouts............21
2.3.15 Time Configuration...................23
2.3.16 Cluster Access......................24
2.3.17 Authentication.....................24
2.3.18 Console.........................25
ii Table of Contents
2.3.19 Summary........................25
2.3.20 Installation........................26
2.4 Head Node Installation:Add-On Method..........28
2.4.1 Prerequisites.......................28
2.4.2 Installing The Installer.................28
2.4.3 Running The Installer.................29
3 Cluster Management With Bright Cluster Manager 35
3.1 Concepts.............................35
3.1.1 Devices..........................35
3.1.2 Software Images....................36
3.1.3 Node Categories....................37
3.1.4 Node Groups......................38
3.1.5 Roles...........................38
3.2 Modules Environment.....................39
3.2.1 Adding And Removing Modules...........39
3.2.2 Using Local And Shared Modules..........39
3.2.3 Setting Up ADefault Environment For All Users..40
3.2.4 Creating AModules Environment Module.....41
3.3 Authentication..........................42
3.3.1 Changing Administrative Passwords On The Cluster 42
3.3.2 Logins Using ssh....................43
3.3.3 Certificates........................43
3.3.4 Profiles..........................44
3.4 Cluster Management GUI...................45
3.4.1 Installing Cluster Management GUI On The Desktop 45
3.4.2 Navigating The Cluster Management GUI.....50
3.4.3 Advanced cmgui Features..............52
3.5 Cluster Management Shell...................56
3.5.1 Invoking cmsh.....................56
3.5.2 Levels,Modes,Help,And Commands Syntax In
cmsh...........................58
3.5.3 Working With Objects.................60
3.5.4 Accessing Cluster Settings...............69
3.5.5 Advanced cmsh Features...............70
3.6 Cluster Management Daemon.................74
3.6.1 Controlling The Cluster Management Daemon...75
3.6.2 Configuring The Cluster Management Daemon..76
3.6.3 Configuring The Cluster Management Daemon
Logging Facilities....................76
3.6.4 Configuration File Modification...........77
3.6.5 Configuration File Conflicts Between The Standard
Distribution And Bright Cluster Manager For Gen-
erated And Non-Generated Files...........78
Table of Contents iii
4 Configuring The Cluster 79
4.1 The Cluster License.......................80
4.1.1 Displaying License Attributes.............80
4.1.2 Verifying ALicense—The verify-license Utility 81
4.1.3 Requesting And Installing ALicense Using AProd-
uct Key..........................83
4.2 Main Cluster Configuration Settings.............90
4.2.1 Cluster Configuration:Various Name-Related Set-
tings...........................90
4.2.2 Cluster Configuration:Some Network-Related Set-
tings...........................91
4.2.3 Miscellaneous Settings.................91
4.3 Network Settings........................92
4.3.1 Configuring Networks.................93
4.3.2 Adding Networks...................96
4.3.3 Changing Network Parameters............97
4.4 Configuring Bridge Interfaces.................106
4.5 Configuring VLANinterfaces.................108
4.5.1 Configuring AVLANInterface Using cmsh....108
4.5.2 Configuring AVLANInterface Using cmgui....109
4.6 Configuring Bonded Interfaces................109
4.6.1 Adding ABonded Interface..............109
4.6.2 Single Bonded Interface On ARegular Node....110
4.6.3 Multiple Bonded Interface On ARegular Node...111
4.6.4 Bonded Interfaces On Head Nodes And HA Head
Nodes..........................111
4.6.5 Tagged VLANOn Top Of a Bonded Interface....112
4.6.6 Further Notes On Bonding..............112
4.7 Configuring InfiniBand Interfaces...............112
4.7.1 Installing Software Packages.............112
4.7.2 Subnet Managers....................113
4.7.3 InfiniBand Network Settings.............114
4.7.4 Verifying Connectivity.................115
4.8 Configuring BMC (IPMI/iLO) Interfaces...........116
4.8.1 BMC Network Settings.................116
4.8.2 BMC Authentication..................118
4.8.3 Interfaces Settings...................119
4.9 Configuring Switches And PDUs...............119
4.9.1 Configuring With The Manufacturer’s Configura-
tion Interface......................119
4.9.2 Configuring SNMP...................119
4.9.3 Uplink Ports.......................120
4.9.4 The showport MAC Address to Port Matching Tool 121
4.10 Disk Layouts:Disked,Semi-Diskless,And Diskless Node
Configuration..........................122
4.10.1 Disk Layouts......................122
iv Table of Contents
4.10.2 Disk Layout Assertions................122
4.10.3 Changing Disk Layouts................122
4.10.4 Changing ADisk Layout FromDisked To Diskless 122
4.11 Configuring NFS Volume Exports And Mounts.......124
4.11.1 Exporting AFilesystemUsing cmgui And cmsh..124
4.11.2 Mounting AFilesystemUsing cmgui And cmsh..126
4.11.3 Mounting A Filesystem Subtree For A Diskless
Node Over NFS.....................129
4.11.4 Mounting The Root Filesystem For A Diskless
Node Over NFS.....................131
4.11.5 Configuring NFS Volume Exports And Mounts
Over RDMAWith OFEDDrivers...........133
4.12 Managing And Configuring Services.............134
4.12.1 Why Use The Cluster Manager For Services?....134
4.12.2 Managing And Configuring Services—Examples.135
4.13 Managing And Configuring ARack.............138
4.13.1 Racks...........................138
4.13.2 Rack View........................140
4.13.3 Assigning Devices To ARack.............143
4.13.4 Assigning Devices To AChassis...........144
4.13.5 An Example Of Assigning A Device To A Rack,
And Of Assigning ADevice To AChassis......153
4.14 Configuring AGPUUnit,And Configuring GPUSettings.154
4.14.1 GPUs And GPUUnits.................154
4.14.2 GPU Unit Configuration Example:The Dell Pow-
erEdge C410x......................155
4.14.3 Configuring GPUSettings...............157
4.15 Configuring CustomScripts..................160
4.15.1 custompowerscript.................160
4.15.2 custompingscript.................160
4.15.3 customremoteconsolescript..........161
4.16 Cluster Configuration Without Execution By CMDaemon.161
4.16.1 Cluster Configuration:The Bigger Picture......161
4.16.2 Making Nodes Function Differently By Image...162
4.16.3 Making All Nodes Function Differently FromNor-
mal Cluster Behavior With FrozenFile......164
4.16.4 Adding Functionality To Nodes Via An
initialize Or finalize Script..........164
4.16.5 Examples Of Configuring Nodes With Or Without
CMDaemon.......................165
5 Power Management 169
5.1 Configuring Power Parameters................169
5.1.1 PDU-Based Power Control..............170
5.1.2 IPMI-Based Power Control..............172
5.1.3 Combining PDU- and IPMI-Based Power Control.172
5.1.4 CustomPower Control.................173
Table of Contents v
5.1.5 Hewlett Packard iLO-Based Power Control.....174
5.2 Power Operations........................175
5.2.1 Power Operations With cmgui............175
5.2.2 Power Operations Through cmsh..........177
5.3 Monitoring Power........................178
5.4 CPUScaling Governors....................178
5.4.1 The Linux Kernel And CPUScaling Governors...178
5.4.2 The Governor List According To sysinfo.....180
5.4.3 Setting The Governor.................180
6 Node Provisioning 183
6.1 Before The Kernel Loads....................183
6.1.1 PXE Booting.......................183
6.1.2 iPXE Booting FromADisk Drive...........186
6.1.3 iPXE Booting Using InfiniBand............186
6.1.4 Booting FromThe Drive................187
6.1.5 The Boot Role......................187
6.2 Provisioning Nodes.......................188
6.2.1 Provisioning Nodes:Configuration Settings....188
6.2.2 Provisioning Nodes:Role Setup With cmsh.....190
6.2.3 Provisioning Nodes:Role Setup With cmgui....191
6.2.4 Provisioning Nodes:Housekeeping.........192
6.3 The Kernel Image,Ramdisk And Kernel Modules.....194
6.3.1 Booting To A“Good State” Software Image.....194
6.3.2 Selecting Kernel Driver Modules To LoadOnto Nodes195
6.3.3 InfiniBand Provisioning................196
6.4 Node-Installer..........................198
6.4.1 Requesting ANode Certificate............199
6.4.2 Deciding Or Selecting Node Configuration.....200
6.4.3 Starting Up All Network Interfaces.........212
6.4.4 Determining Install-mode Type And Execution Mode213
6.4.5 Running Initialize Scripts...............217
6.4.6 Checking Partitions,Mounting File Systems....218
6.4.7 Synchronizing The Local Drive With The Software
Image..........................218
6.4.8 Writing Network Configuration Files........223
6.4.9 Creating ALocal/etc/fstab File.........223
6.4.10 Installing GRUB Bootloader..............223
6.4.11 Running Finalize Scripts................224
6.4.12 Unloading Specific Drivers..............225
6.4.13 Switching To The Local init Process........225
6.5 Node States...........................225
6.5.1 Node States Icons In cmgui..............225
6.5.2 Node States Shown In cmsh..............226
6.5.3 Node States Indicating Regular Start Up......226
6.5.4 Node States That May Indicate Problems......227
6.6 Updating Running Nodes...................230
vi Table of Contents
6.6.1 Updating Running Nodes:Configuration With
excludelistupdate.................231
6.6.2 Updating Running Nodes:With cmsh Using
imageupdate.....................233
6.6.3 Updating Running Nodes:With cmgui Using The
“Update node” Button................234
6.6.4 Updating Running Nodes:Considerations.....234
6.7 Adding NewNodes......................235
6.7.1 Adding New Nodes With cmsh And cmgui Add
Functions........................235
6.7.2 Adding NewNodes With The Node Creation Wizard235
6.8 Troubleshooting The Node Boot Process...........237
6.8.1 Node Fails To PXE Boot................237
6.8.2 Node-installer Logging................238
6.8.3 Provisioning Logging.................239
6.8.4 Ramdisk Fails During Loading Or Sometime Later 239
6.8.5 Ramdisk Cannot Start Network............239
6.8.6 Node-Installer Cannot Create Disk Layout.....240
6.8.7 Node-Installer Cannot Start BMC(IPMI/iLO) Inter-
face............................242
7 Cloudbursting 245
7.1 Cluster-On-Demand Cloudbursting.............245
7.1.1 Cluster-On-Demand:Launching The Head Node
FromThe Cloud Provider...............246
7.1.2 Cluster-On-Demand:Head Node Login And Clus-
ter Configuration....................253
7.1.3 Cluster-On-Demand:Connecting To The headnode
Via cmsh or cmgui...................257
7.1.4 Cluster-On-Demand:Cloud Node Start-up.....258
7.2 Cluster Extension Cloudbursting...............260
7.2.1 Cluster Extension:Cloud Provider Login And
Cloud Director Configuration.............262
7.2.2 Cluster Extension:Cloud Director Start-up.....267
7.2.3 Cluster Extension:Cloud Node Start-up.......271
7.3 Cloudbursting Using The Command Line And cmsh...273
7.3.1 The cloud-setup Script...............274
7.3.2 Launching The Cloud Director............275
7.3.3 Launching The Cloud Nodes.............276
7.3.4 Submitting Jobs With cmsub.............277
7.3.5 Miscellaneous Cloud Commands...........277
7.4 Cloud Considerations And Issues With Bright Cluster
Manager.............................282
7.4.1 Differences Between Cluster-On-Demand And
Cluster Extension....................282
7.4.2 Hardware And Software Availability........283
7.4.3 Reducing Running Costs...............283
Table of Contents vii
7.4.4 Address Resolution In Cluster Extension Networks 284
7.5 Virtual Private Clouds.....................286
7.5.1 EC2-Classic And EC2-VPC..............287
7.5.2 Comparison Of EC2-Classic And EC2-VPC Platforms289
7.5.3 Setting Up And Creating ACustomVPC......290
8 User Management 297
8.1 Managing Users And Groups With cmgui..........297
8.2 Managing Users And Groups With cmsh..........299
8.2.1 Adding AUser.....................299
8.2.2 Saving The Modified State...............300
8.2.3 Editing Properties Of Users And Groups......301
8.2.4 Reverting To The Unmodified State.........303
8.2.5 Removing AUser....................304
8.3 Using An External LDAP Server...............304
8.3.1 External LDAP Server Replication..........307
8.3.2 High Availability....................309
8.4 Tokens And Profiles.......................310
8.4.1 Modifying Profiles...................311
8.4.2 Creation Of Custom Certificates With Profiles,For
Users Managed By Bright Cluster Manager’s Inter-
nal LDAP........................311
8.4.3 Creation Of Custom Certificates With Profiles,For
Users Managed By An External LDAP........314
8.4.4 Logging The Actions Of CMDaemon Users.....315
9 Workload Management 317
9.1 Workload Managers Choices.................317
9.2 Forcing Jobs To Run In AWorkload Management System.318
9.2.1 Disallowing User Logins To Regular Nodes Via cmsh318
9.2.2 Disallowing User Logins To Regular Nodes Via
cmgui..........................319
9.2.3 Disallowing Other User Processes Outside Of
Workload Manager User Processes..........319
9.3 Installation Of Workload Managers..............319
9.3.1 Setting Up,Enabling,And Disabling The Workload
Manager With wlm-setup..............319
9.3.2 Other Options With wlm-setup...........321
9.4 Enabling,Disabling,And Monitoring Workload Managers 322
9.4.1 Enabling AndDisabling AWorkloadManager With
cmgui..........................322
9.4.2 Enabling AndDisabling AWorkloadManager With
cmsh...........................325
9.4.3 Monitoring The Workload Manager Services....327
9.5 Configuring And Running Individual Workload Managers 328
9.5.1 Configuring And Running Slurm...........328
9.5.2 Configuring And Running SGE............333
viii Table of Contents
9.5.3 Configuring And Running Torque..........334
9.5.4 Configuring And Running PBS Pro.........336
9.5.5 Installing,Configuring And Running openlava..338
9.5.6 Installing,Configuring,And Running LSF.....340
9.6 Using cmgui With Workload Management.........345
9.6.1 Jobs Display And Handling In cmgui........345
9.6.2 Queues Display And Handling In cmgui......347
9.6.3 Nodes Display And Handling In cmgui.......349
9.7 Using cmsh With Workload Management..........350
9.7.1 Jobs Display And Handling In cmsh:jobs Mode.350
9.7.2 Job Queues Display And Handling In cmsh:
jobqueue Mode....................351
9.7.3 Nodes Drainage Status And Handling In cmsh...354
9.7.4 Launching Jobs With cm-launcher.........355
9.8 Examples Of Workload Management Assignment.....356
9.8.1 Setting Up ANewCategory And ANewQueue For It356
9.8.2 Setting Up APrejob Health Check..........358
9.9 Power Saving Features.....................360
9.9.1 Slurm...........................360
10 Post-Installation Software Management 361
10.1 Bright Cluster Manager RPMPackages And Their Naming
Convention...........................362
10.2 Managing Packages On The Head Node...........363
10.2.1 Managing RPMPackages On The Head Node...363
10.2.2 Managing Non-RPMSoftware On The Head Node 364
10.3 Kernel Management On AHead Node Or Image......365
10.3.1 Installing AStandard Distribution Kernel......365
10.3.2 Excluding Kernels And Other Packages From Up-
dates...........................366
10.3.3 Updating AKernel In ASoftware Image......367
10.3.4 Setting Kernel Options For Software Images....368
10.3.5 Kernel Driver Modules................368
10.4 Managing An RPM Package In A Software Image And
Running It On Nodes......................371
10.4.1 Installing From Head Via chroot:Installing Into
The Image........................371
10.4.2 Installing From Head Via chroot:Updating The
Node...........................372
10.4.3 Installing From Head Via rpm --root,yum
--installroot Or chroot:Possible Issues...372
10.5 Managing Non-RPM Software In A Software Image And
Running It On Nodes......................373
10.5.1 Managing The Software Directly On An Image...374
10.5.2 Managing The Software Directly On A Node,Then
Syncing Node-To-Image................374
10.6 Creating ACustomSoftware Image.............378
Table of Contents ix
10.6.1 Creating A Base Distribution Archive FromA Base
Host...........................378
10.6.2 Creating The Software Image With
cm-create-image..................380
10.6.3 Configuring Local Repositories For Linux Distribu-
tions,And For The Bright Cluster Manager Package
Repository,For ASoftware Image..........386
10.6.4 Creating ACustomImage FromThe Local Repository389
10.7 Blocking Major OS Updates With The
cm-dist-limit-<DIST> package.............389
11 Cluster Monitoring 391
11.1 ABasic Example Of HowMonitoring Works........391
11.1.1 Before Using The Framework—Setting Up The Pieces392
11.1.2 Using The Framework.................392
11.2 Monitoring Concepts And Definitions............395
11.2.1 Metric..........................395
11.2.2 Action..........................395
11.2.3 Threshold........................395
11.2.4 Health Check......................396
11.2.5 Conceptual Overview:Health Checks Vs Thresh-
old Checks........................396
11.2.6 Severity.........................397
11.2.7 AlertLevel........................397
11.2.8 InfoMessages......................397
11.2.9 Flapping.........................398
11.2.10 Transition........................398
11.2.11 Conceptual Overview:cmgui’s Main Monitoring
Interfaces........................398
11.3 Monitoring Visualization With cmgui............400
11.3.1 The Monitoring Window...............400
11.3.2 The Graph Display Pane................401
11.3.3 Using The Grid Wizard................404
11.3.4 Zooming In With Mouse Gestures..........406
11.3.5 The Graph Display Settings Dialog..........407
11.4 Monitoring Configuration With cmgui............408
11.4.1 The OverviewTab...................409
11.4.2 The Metric Configuration Tab.............409
11.4.3 Health Check Configuration Tab...........415
11.4.4 Metrics Tab.......................418
11.4.5 Health Checks Tab...................422
11.4.6 Actions Tab.......................423
11.5 OverviewOf Monitoring Data For Devices.........424
11.6 Event Viewer..........................424
11.6.1 Viewing The Events In cmgui............424
11.6.2 Using The Event Bucket FromThe Shell For Events
And For Tagging Device States............425
x Table of Contents
11.7 The monitoring Modes Of cmsh..............427
11.7.1 The monitoring actions Mode In cmsh....428
11.7.2 The monitoring healthchecks Mode in cmsh 431
11.7.3 The monitoring metrics Mode In cmsh....434
11.7.4 The monitoring setup Mode in cmsh......435
11.8 Obtaining Monitoring Data Values..............443
11.8.1 The Latest Data Values—The latest
*
data Com-
mands..........................443
11.8.2 Data Values Over Time—The dump
*
Commands.444
11.8.3 The check Command For On-Demand Health
Checks..........................448
11.9 The User Portal.........................449
11.9.1 Accessing The User Portal...............449
11.9.2 Disabling The User Portal...............450
11.9.3 User Portal Home Page................450
12 Day-to-day Administration 453
12.1 Parallel Shell...........................453
12.1.1 pexec In The OS Shell.................454
12.1.2 pexec In cmsh.....................455
12.1.3 pexec In cmgui....................455
12.1.4 Using The -j|-join Option Of pexec.......456
12.1.5 Other Parallel Commands...............456
12.2 Getting Support With Cluster Manager Issues........457
12.2.1 Support Via E-mail...................457
12.2.2 Reporting Cluster Manager Diagnostics With
cm-diagnose.....................458
12.2.3 Requesting Remote Support With
request-remote-assistance..........459
12.3 Backups.............................460
12.3.1 Cluster Installation Backup..............460
12.3.2 Local Database Backups And Restoration......461
12.4 BIOS Configuration And Updates...............464
12.4.1 BIOS Configuration..................464
12.4.2 Updating BIOS.....................465
12.4.3 Booting DOS Image..................465
12.5 Hardware Match Check....................466
12.6 Serial Over LANConsole Access...............467
12.6.1 Background Notes On Serial Console And SOL..467
12.6.2 SOL Console Configuration And Access With cmgui 469
12.6.3 SOL Console Configuration And Access With cmsh 470
12.6.4 The conman Serial Console Logger And Viewer..471
13 Third Party Software 475
13.1 Modules Environment.....................475
13.2 Shorewall.............................475
13.2.1 The Shorewall Service Paradigm...........475
Table of Contents xi
13.2.2 Shorewall Zones,Policies,And Rules........476
13.2.3 Clear And Stop Behavior In service Options,
bash Shell Command,And cmsh Shell.......476
13.2.4 Further Shorewall Quirks...............477
13.3 Compilers............................478
13.3.1 GCC...........................478
13.3.2 Intel Compiler Suite..................478
13.3.3 PGI High-Performance Compilers..........480
13.3.4 AMDOpen64 Compiler Suite.............480
13.3.5 FLEXlmLicense Daemon...............480
13.4 Intel Cluster Checker......................481
13.4.1 Package Installation..................481
13.4.2 Preparing Configuration And Node List Files...482
13.4.3 Running Intel Cluster Checker............484
13.4.4 Applying For The Certificate.............485
13.5 CUDAFor GPUs........................485
13.5.1 Installing CUDA....................485
13.5.2 Installing Kernel Development Packages......488
13.5.3 Verifying CUDA....................488
13.5.4 Verifying OpenCL...................489
13.5.5 Configuring The X server...............490
13.6 OFEDSoftware Stack......................491
13.6.1 Choosing A Distribution Version Or Bright Clus-
ter Manager Version,Ensuring The Kernel Matches,
And Logging The Installation.............491
13.6.2 Mellanox and QLogic OFED Stack Installation Us-
ing The Bright Computing Repository........492
13.7 Lustre...............................495
13.7.1 Architecture.......................495
13.7.2 Server Implementation.................495
13.7.3 Client Implementation.................500
13.8 ScaleMP.............................504
13.8.1 Installing vSMP For Cloud..............504
13.8.2 Creating Virtual SMP Nodes.............505
13.8.3 Virtual SMP Node Settings..............505
14 MIC Configuration 507
14.1 Introduction...........................507
14.2 MIC Software Installation...................508
14.2.1 MIC Software Packages................508
14.2.2 MIC Environment MIC Commands.........509
14.2.3 Bright Computing MIC Tools.............510
14.2.4 MIC OFEDInstallation.................510
14.3 MIC Configuration.......................511
14.3.1 Using cm-mic-setup To Configure MICs.....511
14.3.2 Using cmsh To Configure Some MIC Properties..514
14.3.3 Using cmgui To Configure Some MIC Properties.515
xii Table of Contents
14.3.4 Using MIC Overlays To Place Software On The MIC 519
14.4 MIC Card Flash Updates....................521
14.5 Other MIC Administrative Tasks...............523
14.5.1 HowCMDaemon Manages MIC Cards.......523
14.5.2 Using Workload Managers With MIC........524
14.5.3 Mounting The Root FilesystemFor AMIC Over NFS 525
14.5.4 MIC Metrics.......................526
14.5.5 User Management On The MIC............526
15 High Availability 529
15.1 HAConcepts..........................529
15.1.1 Primary,Secondary,Active,Passive.........529
15.1.2 Monitoring The Active Head Node,Initiating
Failover.........................530
15.1.3 Services In Bright Cluster Manager HASetups...530
15.1.4 Failover Network Topology..............531
15.1.5 Shared Storage.....................532
15.1.6 Guaranteeing One Active Head At All Times....534
15.1.7 Automatic Vs Manual Failover............535
15.1.8 HAAnd Cloud Nodes.................536
15.2 HASetup Procedure Using cmha-setup..........536
15.2.1 Preparation.......................538
15.2.2 Cloning.........................539
15.2.3 Shared Storage Setup..................542
15.2.4 Automated Failover And Relevant Testing.....544
15.3 Running cmha-setup Without Ncurses,Using An XML
Specification...........................545
15.3.1 Why Run It Without Ncurses?............545
15.3.2 The Syntax Of cmha-setup Without Ncurses...546
15.3.3 Example cmha-setup Run Without Ncurses....546
15.4 Managing HA..........................547
15.4.1 Changing An Existing Failover Configuration...547
15.4.2 cmha Utility.......................547
15.4.3 States...........................551
15.4.4 Failover Action Decisions...............552
15.4.5 Keeping Head Nodes In Sync.............553
15.4.6 High Availability Parameters.............554
15.4.7 Handling And Viewing Failover Via cmgui.....556
15.4.8 Re-cloning AHead Node...............557
A Generated Files 559
A.1 Files Generated Automatically On Head Nodes.......559
A.2 Files Generated Automatically In Software Images:....562
A.3 Files Generated Automatically On Regular Nodes.....563
A.4 Files Not Generated,But Installed...............563
B Bright Computing Public Key 569
Table of Contents xiii
C CMDaemon Configuration File Directives 571
D Disk Partitioning 591
D.1 Structure Of Partitioning Definition.............591
D.2 Example:Default Node Partitioning.............596
D.3 Example:Software RAID....................597
D.4 Example:Software RAIDWith Swap.............598
D.5 Example:Logical Volume Manager..............599
D.6 Example:Diskless........................600
D.7 Example:Semi-diskless....................601
D.8 Example:Preventing Accidental Data Loss.........602
D.9 Example:Using CustomAssertions.............603
E Example initialize And finalize Scripts 605
E.1 When Are They Used?.....................605
E.2 Accessing Fromcmgui And cmsh..............605
E.3 Environment Variables Available To initialize And
finalize Scripts........................606
E.4 Using Environment Variables Stored In Multiple Variables 609
E.5 Storing AConfiguration To AFilesystem..........610
E.5.1 Storing With Initialize Scripts.............610
E.5.2 Ways Of Writing AFinalize Script To Configure The
Destination Nodes...................611
E.5.3 Restricting The Script To Nodes Or Node Categories 612
F Quickstart Installation Guide 613
F.1 Installing The Head Node...................613
F.2 First Boot.............................616
F.3 Booting Regular Nodes.....................617
F.4 Running Cluster Management GUI..............618
G Workload Managers Quick Reference 621
G.1 Slurm...............................621
G.2 Sun Grid Engine.........................622
G.3 Torque..............................624
G.4 PBS Pro..............................624
G.5 openlava.............................625
H Metrics,Health Checks,And Actions 627
H.1 Metrics And Their Parameters.................627
H.1.1 Metrics..........................627
H.1.2 Parameters For Metrics................635
H.2 Health Checks And Their Parameters............639
H.2.1 Health Checks......................639
H.2.2 Parameters For Health Checks............644
H.3 Actions And Their Parameters................645
H.3.1 Actions..........................645
H.3.2 Parameters For Actions................646
xiv Table of Contents
I Metric Collections 649
I.1 Metric Collections Added Using cmsh............649
I.2 Metric Collections Initialization................649
I.3 Metric Collections Output During Regular Use.......650
I.4 Error Handling.........................651
I.5 Environment Variables.....................651
I.6 Metric Collections Examples..................653
I.7 iDataPlex And Similar Units..................653
J Changing The Network Parameters Of The Head Node 657
J.1 Introduction...........................657
J.2 Method..............................657
J.3 Terminology...........................659
K Bright Cluster Manager Python API 661
K.1 Installation............................661
K.1.1 Windows Clients....................661
K.1.2 Linux Clients......................662
K.2 Examples.............................662
K.2.1 First Program......................662
K.3 Methods And Properties....................664
K.3.1 Viewing All Properties And Methods........664
K.3.2 Property Lists......................664
K.3.3 Creating NewObjects.................664
K.3.4 List Of Objects.....................665
K.3.5 Useful Methods.....................667
K.3.6 Useful Example Program...............668
L Workload Manager Configuration Files Updated By CMDaemon671
L.1 Slurm...............................671
L.2 Grid Engine...........................671
L.3 Torque..............................672
L.4 PBS Pro..............................672
L.5 LSF................................672
L.6 openlava.............................673
M Linux Distributions That Use Registration 675
M.1 Registering ARed Hat Enterprise Linux Based Cluster..675
M.1.1 Registering AHead Node With RHEL........675
M.1.2 Registering ASoftware Image With RHEL.....677
M.2 Registering ASUSE Linux Enterprise Server Based Cluster 679
M.2.1 Registering AHead Node With SUSE........679
M.2.2 Registering ASoftware Image With SUSE......679
N Burning Nodes 681
N.1 Test Scripts Deployment....................681
N.2 Burn Configurations......................681
N.2.1 Mail Tag.........................682
Table of Contents xv
N.2.2 Pre-install And Post-install..............682
N.2.3 Post-burn Install Mode.................682
N.2.4 Phases..........................683
N.2.5 Tests...........................683
N.3 Running ABurn Configuration................683
N.3.1 Burn Configuration And Execution In cmgui....683
N.3.2 Burn Configuration And Execution In cmsh....684
N.3.3 Writing ATest Script..................690
N.3.4 Burn Failures......................693
N.4 Relocating The Burn Logs...................694
N.4.1 Configuring The Relocation..............694
N.4.2 Testing The Relocation.................695
O Changing The LDAP Password 697
O.1 Setting ANewPassword For The LDAP Server.......697
O.2 Setting The NewPassword In cmd.conf..........697
O.3 Checking LDAP Access....................698
P Configuring SELinux 699
P.1 Introduction...........................699
P.2 Enabling SELinux On SLES11SP2 Systems..........699
P.2.1 Regular Nodes.....................700
P.2.2 Head Node.......................701
P.3 Enabling SELinux on RHEL6.................701
P.3.1 Regular Nodes.....................701
P.3.2 Head Node.......................702
P.4 Additional Considerations...................702
P.5 FilesystemSecurity Context Checks.............702
Preface
Welcome to the Administrator Manual for the Bright Cluster Manager 6.1
cluster environment.
0.1 Quickstart
For readers who want to get a cluster up and running as quickly as pos-
sible with Bright Cluster Manager,Appendix F is a quickstart installation
guide.
0.2 About This Manual
The rest of this manual is aimed at helping systemadministrators install,
understand,and manage a cluster running Bright Cluster Manager so as
to get the best out of it.
The Administrator Manual covers administration topics which are spe-
cific to the Bright Cluster Manager environment.Readers should already
be familiar with basic Linux system administration,which the manual
does not generally cover.Aspects of system administration that require
a more advanced understanding of Linux concepts for clusters are ex-
plained appropriately.
This manual is not intended for users interested only in interacting
with the cluster to run compute jobs.The User Manual is intended to get
such users up to speed with the user environment and workload man-
agement system.
Updatedversions of the Administrator Manual,as well as the User Man-
ual,are always available on the cluster at/cm/shared/docs/cm.
The manuals constantly evolve to keep up with the development of
the Bright Cluster Manager environment and the addition of new hard-
ware and/or applications.
The manuals also regularly incorporate customer feedback.Adminis-
trator and user input is is greatly valued at Bright Computing,so that any
comments,suggestions or corrections will be very gratefully accepted at
manuals@brightcomputing.com.
0.3 Getting Administrator-Level Support
Unless the Bright Cluster Manager reseller offers support,sup-
port is provided by Bright Computing over e-mail via support@
brightcomputing.com.Section 12.2 has more details on working with
support.
1
Introduction
1.1 What Is Bright Cluster Manager?
Bright Cluster Manager 6.1 is a cluster management application built on
top of a major Linux distribution.It is available for:
• Scientific Linux 5 and 6 (x86_64 only)
• Red Hat Enterprise Linux Server 5 and 6 (x86_64 only)
• CentOS 5 and 6 (x86_64 only)
• SUSE Enterprise Server 11 (x86_64 only)
This chapter introduces some basic features of Bright Cluster Manager
and describes a basic cluster in terms of its hardware.
1.2 Cluster Structure
In its most basic form,a cluster running Bright Cluster Manager contains:
• One machine designated as the head node
• Several machines designated as compute nodes
• One or more (possibly managed) Ethernet switches
• One or more power distribution units (Optional)
The head node is the most important machine within a cluster be-
cause it controls all other devices,such as compute nodes,switches and
power distribution units.Furthermore,the head node is also the host
that all users (including the administrator) log in to.The head node is
the only machine that is connected directly to the external network and
is usually the only machine in a cluster that is equipped with a monitor
and keyboard.The head node provides several vital services to the rest
of the cluster,such as central data storage,workload management,user
management,DNS and DHCP service.The head node in a cluster is also
frequently referred to as the master node.
Often,the head node is replicated to a second head node,frequently
called a passive head node.If the active head node fails,the passive head
node can become active and take over.This is known as a high avail-
ability setup,and is a typical configuration (chapter 15) in Bright Cluster
Manager.
© Bright Computing,Inc.
2 Introduction
A cluster normally contains a considerable number of non-head,or
regular nodes,also referred to simply as nodes.
Most of these nodes are compute nodes.Compute nodes are the ma-
chines that will do the heavy work when a cluster is being used for large
computations.In addition to compute nodes,larger clusters may have
other types of nodes as well (e.g.storage nodes and login nodes).Nodes
can be easily installed through the (network bootable) node provision-
ing system that is included with Bright Cluster Manager.Every time a
compute node is started,the software installed on its local hard drive is
synchronized automatically against a software image which resides on
the head node.This ensures that a node can always be brought back to
a “known state”.The node provisioning system greatly eases compute
node administration and makes it trivial to replace an entire node in the
event of hardware failure.Software changes need to be carried out only
once (in the software image),and can easily be undone.In general,there
will rarely be a need to log on to a compute node directly.
In most cases,a cluster has a private internal network,which is usu-
ally built from one or multiple managed Gigabit Ethernet switches,or
made up of an InfiniBand fabric.The internal network connects all nodes
to the head node and to each other.Compute nodes use the internal net-
work for booting,data storage and interprocess communication.In more
advanced cluster setups,there may be several dedicated networks.Note
that the external network (which could be a university campus network,
company network or the Internet) is not normally directly connected to
the internal network.Instead,only the head node is connected to the
external network.
Figure 1.1 illustrates a typical cluster network setup.
Figure 1.1:Cluster network
Most clusters are equippedwith one or more power distribution units.
These units supply power to all compute nodes and are also connected to
the internal cluster network.The headnode in a cluster can use the power
control units to switch compute nodes on or off.Fromthe head node,it is
straightforward to power on/off a large number of compute nodes with
a single command.
© Bright Computing,Inc.
1.3 Bright Cluster Manager Administrator And User Environment 3
1.3 Bright Cluster Manager Administrator And User
Environment
Bright Cluster Manager contains several tools and applications to facili-
tate the administration and monitoring of a cluster.In addition,Bright
Cluster Manager aims to provide users with an optimal environment for
developing andrunning applications that require extensive computational
resources.
• The administrator normally deals with the cluster software config-
uration via a front end to the Bright Cluster Manager.This can be
GUI-based (cmgui,section 3.4),or shell-based (cmsh,section 3.5).
Other tasks can be handled via special tools provided with Bright
Cluster Manager,or the usual Linux tools.The use of Bright Cluster
Manager tools is usually recommended over standard Linux tools
because cluster administration often has special issues,including
that of scale.
The Administrator Manual (this manual) covers how Bright Clus-
ter Manager is used by the administrator to do all this.Ad-
ditionally,some more obscure configuration cases can often be
found in the Bright Computing knowledge base at http://kb.
brightcomputing.com,along with some procedures that are not
really within the scope of Bright Cluster Manager itself,but that
may come up as part of related general Linux configuration.Online
support is also available (section 12.2).
• The user normally interacts with the cluster by logging into a cus-
tomLinux user environment to run jobs.Details on howto do this
fromthe perspective of the user are given in the User Manual.
1.4 Organization Of This Manual
The following chapters of this manual describe all aspects of Bright Clus-
ter Manager fromthe perspective of a cluster administrator.
Chapter 2 gives step-by-step instructions to installing Bright Cluster
Manager on the head node of a cluster.Readers with a cluster that was
shipped with Bright Cluster Manager pre-installed may safely skip this
chapter.
Chapter 3 introduces the main concepts and tools that play a central
role in Bright Cluster Manager,laying down groundwork for the remain-
ing chapters.
Chapter 4 explains how to configure and further set up the cluster
after software installation of Bright Cluster Manager on the head node.
Chapter 5 describes howpower management withinthe cluster works.
Chapter 6 explains node provisioning in detail.
Chapter 7 explains howto carry out cloudbursting.
Chapter 8 explains howaccounts for users and groups are managed.
Chapter 9 explains how workload management is implemented and
used.
Chapter 10 demonstrates a number of techniques and tricks for work-
ing with software images and keeping images up to date.
Chapter 11 explains how the monitoring features of Bright Cluster
Manager can be used.
© Bright Computing,Inc.
4 Introduction
Chapter 12 summarizes several useful tips and tricks for day to day
monitoring.
Chapter 13 describes a number of third party software packages that
play a role in Bright Cluster Manager.
Chapter 14 describes how the Intel MIC architecture integrates with
Bright Cluster Manager.
Chapter 15 gives details and setup instructions for high availability
features provided by Bright Cluster Manager.These can be followed to
build a cluster with redundant head nodes.
The appendices generally give supplementary details to the main text.
© Bright Computing,Inc.
2
Installing Bright Cluster
Manager
This chapter describes the installation of Bright Cluster Manager onto the
head node of a cluster.Sections 2.1 and 2.2 list hardware requirements
and supported hardware.Section 2.3 gives step-by-step instructions on
installing Bright Cluster Manager froma DVDonto a head node that has
no operating systemrunning on it initially,while section 2.4 gives instruc-
tions on installing onto a head node that already has an operating system
running on it.
Once the head node is installed,the other,regular,nodes can (PXE)
boot off the head node and provision themselves from it with a default
image,without requiring a Linux distribution DVDthemselves.The reg-
ular nodes normally have any existing data wiped during the process of
provisioning fromthe head node.The PXE boot and provisioning process
for the regular nodes is described in Chapter 6.
The installation of software on an already-configured cluster running
Bright Cluster Manager is described in Chapter 10.
2.1 Minimal Hardware Requirements
The following are minimal hardware requirements:
2.1.1 Head Node
• Intel Xeon or AMDOpteron CPU(64-bit)
• 2GB RAM
• 80GB diskspace
• 2 Gigabit Ethernet NICs (for the most commonType 1 topology (sec-
tion 2.3.6))
• DVDdrive
2.1.2 Compute Nodes
• Intel Xeon or AMDOpteron CPU(64-bit)
• 1GB RAM(at least 4GB is recommended for diskless nodes)
• 1 Gigabit Ethernet NIC
© Bright Computing,Inc.
6 Installing Bright Cluster Manager
2.2 Supported Hardware
The following hardware is supported:
2.2.1 Compute Nodes
• SuperMicro
• Cray
• Cisco
• Dell
• IBM
• Asus
• Hewlett Packard
Other brands are also expected to work,even if not explicitly sup-
ported.
2.2.2 Ethernet Switches
• HP ProCurve
• Nortel
• Cisco
• Dell
• SuperMicro
• Netgear
Other brands are also expected to work,although not explicitly sup-
ported.
2.2.3 Power Distribution Units
• APC (American Power Conversion) Switched Rack PDU
Other brands with the same SNMP MIB mappings are also expected
to work,although not explicitly supported.
2.2.4 Management Controllers
• IPMI 1.5/2.0
• HP iLO1/2/3
2.2.5 InfiniBand
• Mellanox HCAs,and most other InfiniBand HCAs
• Mellanox InfiniBand switches
• QLogic (Intel) InfiniBand switches
• Most other InfiniBand switches
© Bright Computing,Inc.
2.3 Head Node Installation:Bare Metal Method 7
2.3 Head Node Installation:Bare Metal Method
A bare metal installation,that is,installing the head node onto a machine
with no operating system on it already,is the recommended option be-
cause it cannot run into issues froman existing configuration.An operat-
ing systemfromone of the ones listed in section 1.1 is installed during a
bare metal installation.The alternative to a bare metal installation is the
add-on installation of section 2.4.
To start a bare metal installation,the time in the BIOS of the head node
is set to local time and the head node is set to boot fromDVD.The head
node is then booted fromthe Bright Cluster Manager DVD.
2.3.1 Welcome Screen
The welcome screen (figure 2.1) displays version and license information.
Two installation modes are available,normal mode and express mode.
Selecting the express mode installs the head node with the predefined
configuration that the DVD was created with.The administrator pass-
word automatically set when express mode is selected is:system.Click-
ing on the Continue button brings up the Bright Cluster Manager soft-
ware license screen,described next.
Figure 2.1:Installation welcome screen for Bright Cluster Manager
2.3.2 Software License
The “Bright Computing Software License” screen (figure 2.2) ex-
plains the applicable terms and conditions that apply to use of the Bright
Cluster Manager software.
Accepting the terms and conditions,and clicking on the Continue
buttonleads to the Base Distribution EULA(EndUser License Agree-
ment) (figure 2.3).
Accepting the terms and conditions of the base distribution EULA,
and clicking on the Continue button leads to two possibilities.
© Bright Computing,Inc.
8 Installing Bright Cluster Manager
1.If express mode was selected earlier,then the installer skips ahead
to the Summary screen (figure 2.28),where it shows an overview
of the predefined installation parameters,and awaits user input to
start the install.
2.Otherwise,if normal installation mode was selected earlier,then the
“Kernel Modules” configuration screen is displayed,described
next.
Figure 2.2:Bright Cluster Manager Software License
Figure 2.3:Base Distribution End User License Agreement
© Bright Computing,Inc.
2.3 Head Node Installation:Bare Metal Method 9
2.3.3 Kernel Modules Configuration
The Kernel Modules screen (figure 2.4) shows the kernel modules rec-
ommended for loading based on hardware auto-detection.
Figure 2.4:Kernel Modules Recommended For Loading After Probing
Clicking the + button opens an input box for adding a module name
and optional module parameters (figure 2.5).
Figure 2.5:Adding Kernel Modules
Similarly,the - button removes a selected module fromthe list.The
arrowbuttons move a kernel module up or down in the list.Kernel mod-
ule loading order decides the exact name assigned to a device (e.g.sda,
© Bright Computing,Inc.
10 Installing Bright Cluster Manager
sdb,eth0,eth1).
After optionally adding or removing kernel modules,clicking
Continue leads to the “Hardware Information” overview screen,
described next.
2.3.4 Hardware Overview
The “Hardware Information” screen (figure 2.6) provides an
overview of detected hardware depending on the kernel modules that
have been loaded.If any hardware is not detected at this stage,the
“Go Back” button is used to go back to the “Kernel Modules” screen
(figure 2.4) to add the appropriate modules,and then the “Hardware
Information” screen is returned to,to see if the hardware has been de-
tected.Clicking Continue in this screen leads to the Nodes configura-
tion screen,described next.
Figure 2.6:Hardware OverviewBased On Loaded Kernel Modules
2.3.5 Nodes Configuration
The Nodes screen (figure 2.7) configures the number of racks,the number
of regular nodes,the node basename,the number of digits for nodes,and
the hardware manufacturer.
The maximumnumber of digits is 5,to keep the hostname reasonably
readable.
The “Node Hardware Manufacturer” selection option initializes
any monitoring parameters relevant for that manufacturer’s hardware.If
the manufacturer is not known,then Other is selected fromthe list.
Clicking Continue in this screen leads to the “Network Topology”
selection screen,described next.
© Bright Computing,Inc.
2.3 Head Node Installation:Bare Metal Method 11
Figure 2.7:Nodes Configuration
2.3.6 Network Topology
Regular nodes are always locatedonaninternal network,by default called
Internalnet.
The “Network Topology” screen allows selection of one of three
different network topologies.
A type 1 network (figure 2.8),with nodes connected on a private inter-
nal network.This is the default network setup.In this topology,
a network packet from a head or regular node destined for any
external network that the cluster is attached to,by default called
Externalnet,canonly reachthe external network by being routed
and forwarded at the head node itself.The packet routing for
Externalnet is configured at the head node.
A type 2 network (figure 2.9) has its nodes connected via a router to a
public network.In this topology,a network packet from a regular
node destined for outside the cluster does not go via the head node,
but uses the router to reach a public network.Packets destined for
the head node however still go directly to the head node.Any rout-
ing for beyond the router is configured on the router,and not on the
cluster or its parts.Care should be taken to avoid DHCP conflicts
between the DHCP server on the head node and any existing DHCP
server on the internal network if the cluster is being placed within
an existing corporate network that is also part of Internalnet
(there is no Externalnet in this topology).Typically,in the case
where the cluster becomes part of an existing network,there is an-
other router configured and placed between the regular corporate
machines and the cluster nodes to shield themfromeffects on each
other.
A type 3 network (figure 2.10),with nodes connected on a routed pub-
lic network.In this topology,a network packet from a regular
© Bright Computing,Inc.
12 Installing Bright Cluster Manager
node,destined for another network,uses a router to get to it.
The head node,being on another network,can only be reached
via a router too.The network the regular nodes are on is called
Internalnet by default,and the network the head node is on
is called Managementnet by default.Any routing configuration
for beyond the routers that are attached to the Internalnet and
Managementnet networks is configured on the routers,and not on
the clusters or its parts.
Selecting the network topology helps decide the predefined networks
on the Networks settings screen later (figure 2.15).Clicking Continue
here leads to the “Additional Network Configuration” screen,
described next.
Figure 2.8:Networks Topology:nodes connected on a private internal
network
© Bright Computing,Inc.
2.3 Head Node Installation:Bare Metal Method 13
Figure 2.9:Networks Topology:nodes connected on a public network
Figure 2.10:Network Topology:nodes connected on a routed public net-
work
2.3.7 Additional Network Configuration
The “Additional Network Configuration” screen allows the con-
figuration of high speed interconnect networks,and also of BMC net-
works (figure 2.11).
© Bright Computing,Inc.
14 Installing Bright Cluster Manager
Figure 2.11:Additional Network Configuration:OFEDnetworking
Figure 2.12:Additional Network Configuration:Interconnect Interface
Figure 2.13:Additional Network Configuration:OFEDstack
Figure 2.14:Additional Network Configuration:BMC type
• The interconnect selector options are to configure the compute nodes
so that they communicate quickly with each other while running
computational workload jobs.
The choices include 10 Gig-E and InfiniBand RDMA OFED (fig-
ure 2.12).The regular nodes of a cluster can be set to boot over
the chosen option in both cases.
© Bright Computing,Inc.
2.3 Head Node Installation:Bare Metal Method 15
In the case of InfiniBand,there are OFED stack driver options (fig-
ure 2.13).The OFEDstack used can be the parent distribution pack-
aged stack,or it can be the appropriate (Mellanox 1.5,Mellanox 2.0,
or QLogic) InfiniBand hardware vendor stack.Currently,choosing
the parent distribution stack is recommended because it tends to be
integratedbetter with the OS.OFEDinstallation is discussedfurther
in section 13.6.
• The BMC(Baseboard Management Controller) selector options con-
figure the BMC network for the regular nodes (figure 2.14).The ap-
propriate BMC network type (IPMI or iLO) should be chosen if a
BMC is to be used.The remaining options—adding the network,
and automatically configuring the network—can then be set.The
BMC configuration is discussed further in section 4.8.
If a BMC is to be used,the BMC password is set to a randomvalue.
Retrieving andchanging a BMCpasswordis coveredinsection4.8.2.
Clicking Continue in figure 2.11 leads to the Networks configura-
tion screen,described next.
2.3.8 Networks Configuration
The Networks configuration screen (figure 2.15) displays the prede-
fined list of networks,based on the selected network topology.IPMI
and InfiniBand networks are defined based on selections made in the
“Additional Network Configuration” screen earlier (figure 2.11).
The parameters of the network interfaces can be configured in this
screen.
For a type 1 setup,an external network and an internal network are al-
ways defined.
For a type 2 setup only an internal network is defined and no external
network is defined.
For a type 3 setup,an internal network and a management network are
defined.
A pop-up screen is used to help fill these values in for a type 1 net-
work.The values can be provided via DHCP,but usually static values
are used in production systems to avoid confusion.The pop-up screen
asks for IP address details for the external network,where the network
externalnet corresponds to the site network that the cluster resides in
(e.g.a corporate or campus network).The IP address details are therefore
the details of the head node for a type 1 externalnet network (figure 2.8).
Clicking Continue in this screen validates all network settings.In-
valid settings for any of the defined networks cause an alert to be dis-
played,explaining the error.A correction is then needed to proceed fur-
ther.
If all settings are valid,the installation proceeds on to the
Nameservers screen,described in the next section.
© Bright Computing,Inc.
16 Installing Bright Cluster Manager
Figure 2.15:Networks Configuration
2.3.9 Nameservers And Search Domains
Search domains and external name servers can be added or removed
using the Nameservers screen (figure 2.16).Using an external name
server is recommended.Clicking on Continue leads to the “Network
Interfaces” configuration screen,described next.
Figure 2.16:Nameservers and search domains
2.3.10 Network Interfaces Configuration
The “Network Interfaces” screen (figure 2.17) allows a reviewof the
list of network interfaces with their proposedsettings.The headnode and
© Bright Computing,Inc.
2.3 Head Node Installation:Bare Metal Method 17
regular nodes each have a settings pane for their network configurations.
Figure 2.17:Network Interface Configuration
An icon in the Head Node Interfaces section,where the hover-
text is showing in the figure,allows the ethernet network interface order
to be changed on the head node.For example,if the interfaces with the
names eth0 and eth1 need to be swapped around,clicking on the icon
brings up a screen allowing the names to be associated with specific MAC
addresses (figure 2.18).
Figure 2.18:Network Interface Configuration Order Changing
© Bright Computing,Inc.
18 Installing Bright Cluster Manager
For node network interfaces,the IP offset can be modified.
1
A different network can be selected for each interface using the drop-
down box in the Network column.Selecting Unassigned disables a
network interface.
If the corresponding network settings are changed (e.g.,base address
of the network) the IPaddress of the headnode interface needs to be mod-
ified accordingly.If IP address settings are invalid,an alert is displayed,
explaining the error.
Clicking Continue on a “Network Interfaces” screen validates
IP address settings for all node interfaces.
If all setting are correct,and if InfiniBand networks have been de-
fined,clicking on Continue leads to the “Subnet Managers” screen
(figure 2.19),described in the next section.
If no InfiniBand networks are defined,or if InfiniBand networks
have not been enabled on the networks settings screen,then click-
ing Continue instead leads to the CD/DVD ROMs selection screen (fig-
ure 2.20).
2.3.11 Select Subnet Managers
The “Subnet Managers” screen in figure 2.19 is only displayed if an In-
finiBand network was defined,and lists all the nodes that can run the
InfiniBand subnet manager.The nodes assigned the role of a subnet
manager are ticked,and the Continue button is clicked to go on to the
“CD/DVD ROMs” selection screen,described next.
1
The IP offset is used to calculate the IP address assigned to a regular node interface.The
nodes are conveniently numbered in a sequence,so their interfaces are typically also given a
network IP address that is in a sequence on a selected network.In Bright Cluster Manager,
interfaces by default have their IP addresses assigned to them sequentially,in steps of 1,
starting after the network base address.
The default IP offset is 0.0.0.0,which means that the node interfaces by default start their
range at the usual default values in their network.
With a modified IP offset,the point at which addressing starts is altered.For example,a
different offset might be desirable when no IPMI network has been defined,but the nodes
of the cluster do have IPMI interfaces in addition to the regular network interfaces.If a
modified IP offset is not set for one of the interfaces,then the BOOTIF and ipmi0 interfaces
get IP addresses assigned on the same network by default,which could be confusing.
However,if an offset is entered for the ipmi0 interface,then the assigned IPMI IP ad-
dresses start fromthe IP address specified by the offset.That is,each modified IPMI address
takes the value:
address that would be assigned by default + IP offset
Example
Taking the case where BOOTIF and IPMI interfaces would have IP addresses on the same
network with the default IP offset:
Then,on a cluster of 10 nodes,a modified IPMI IP offset of 0.0.0.20 means:
• the BOOTIF interfaces stay on 10.141.0.1,...,10.141.0.10 while
• the IPMI interfaces range from10.141.0.21,...,10.141.0.30
© Bright Computing,Inc.
2.3 Head Node Installation:Bare Metal Method 19
Figure 2.19:Subnet Manager Nodes
2.3.12 Select CD/DVD ROM
The “CD/DVD ROMs” screen in figure 2.20 lists all detected CD/DVD-
ROMdevices.If multiple drives are found,then the drive with the Bright
Cluster Manager DVD needs to be selected by the administrator.If the
installation source is not detected,it can be added manually.
Optionally,a media integrity check can be set.
Clicking on the Continue button starts the media integrity check,if it
was ordered.The media integrity check can take about a minute to run.If
all is well,then the “Workload Management” setup screen is displayed,
as described next.
© Bright Computing,Inc.
20 Installing Bright Cluster Manager
Figure 2.20:DVDSelection
2.3.13 Workload Management Configuration
The “Workload Management” configuration screen (figure 2.21) allows
selection froma list of supported workload managers.Aworkload man-
agement systemis highly recommended to run multiple compute jobs on
a cluster.
The Maui and Moab scheduler can be configured to run on the cluster
if selected.However,these are not installed by the Bright Cluster Man-
ager installer because Adaptive Computing prefers to distribute themdi-
rectly.Details on installing the packages after the cluster has been in-
stalled are given in Chapter 9 on workload management.
To prevent a workload management systemfrombeing set up,select
None.If a workload management system is selected,then the number
of slots per node can be set,otherwise the slots setting is ignored.If no
changes are made,then the number of slots defaults to the CPUcount on
the head node.
The head node can also be selected for use as a compute node,which
can be a sensible choice on small clusters.The setting is ignored if no
workload management systemis selected.
Clicking Continueonthis screenleads to the “Disk Partitioning
and Layouts” screen,described next.
© Bright Computing,Inc.
2.3 Head Node Installation:Bare Metal Method 21
Figure 2.21:Workload Management Setup
2.3.14 Disk Partitioning And Layouts
The partitioning layout XMLschema is describedindetail inAppendix D.
The “Disk Partitioning and Layouts” configuration screen
(figure 2.22):
Figure 2.22:Disk Partitioning And Layouts
• selects the drive that the cluster manager is installed onto on the
head node.
• sets the disk partitioning layout for the head node and regular
nodes with the two options:“Head node disk layout” and
© Bright Computing,Inc.
22 Installing Bright Cluster Manager
“Node disk layout”.

*
The head node by default uses
 one big partitionif it has a drive size smaller thanabout
500GB
 several partitions if it has a drive size greater than or
equal to about 500GB.
*
The regular node by default uses several partitions.
Apartitioning layout other than the default can be selected for
installation from the drop-down boxes for the head node and
regular nodes.
– A partitioning layout is the only installation setting that can-
not easily be changed after the completion (section 2.3.20) of
installation.It should therefore be decided upon with care.
– A text editor pops up when the edit button of a partitioning
layout is clicked (figure 2.23).This allows the administrator to
view and change layout values within the layout’s configura-
tion XML file using the schema in Appendix D.1.
The Save and Reset buttons are enabled on editing,and save
or undo the text editor changes.Once saved,the changes can-
not be reverted automatically in the text editor,but must be
done manually.
The XML schema allows the definition of a great variety of layouts in the
layout’s configuration XML file.For example:
1.for a large cluster or for a cluster that is generating a lot of mon-
itoring or burn data,the default partition layout partition size for
/var may fill up with log messages because log messages are usu-
ally stored under/var/log/.If/var is in a partition of its own,as
in the default partitioning layout presented when the hard drive is
about 500GB or more,then providing a larger size of partition than
the default for/var allows more logging to take place before/var
is full.Modifying the value found within the <size></size> tags
associated with that partition in the XML file (figure 2.23) modifies
the size of the partition that is to be installed.
2.the administrator could specify the layout for multiple non-RAID
drives on the head node using one <blockdev></blockdev> tag
pair withinanenclosing <device></device> tag pair for eachdrive.
© Bright Computing,Inc.
2.3 Head Node Installation:Bare Metal Method 23
Figure 2.23:Edit Head Node Disk Partitioning
Clicking Continue on the “Disk Patitioning and Layouts”
screen leads to the “Time Configuration” screen,described next.
2.3.15 Time Configuration
The “Time Configuration” screen (figure 2.24) displays a predefined
list of time servers.
Figure 2.24:Time Configuration
Timeservers can be removed by selecting a time server from the list
and clicking the - button.Additional time servers can be added by en-
tering the name of the time server and clicking the + button.Atimezone
© Bright Computing,Inc.
24 Installing Bright Cluster Manager
can be selected fromthe drop-down box if the default is incorrect.Click-
ing Continue leads to the “Cluster Access” screen,described next.
2.3.16 Cluster Access
The “Cluster Access” screen (figure 2.25) sets the existence of a clus-
ter management web portal service,and also sets network access to sev-
eral services.
Figure 2.25:Cluster Access
These services are the web portal,ssh,and the cluster management
daemon.
If restricting network access for a service is chosen,then an editable
list of networks that may access the service is displayed.By default the
list has no members.The screen will not move on to the next screen until
the list contains at least one CIDR-format network IP address.
If the conditions for this screen are satisfied,then clicking Continue
leads to the Authentication screen,described next.
2.3.17 Authentication
The Authentication screen (figure 2.26) requires the password to be
set twice for the cluster administrator.The cluster name and the head
node hostname can also be set in this screen.Clicking Continue vali-
dates the passwords that have been entered,and if successful,leads to
the Console screen,described next.
© Bright Computing,Inc.
2.3 Head Node Installation:Bare Metal Method 25
Figure 2.26:Authentication
2.3.18 Console
The Console screen (figure 2.27) allows selection of a graphical mode
or a text console mode for when the head node or regular nodes boot.
Clicking Continue leads to the Summary screen,described next.
Figure 2.27:Console
2.3.19 Summary
The Summary screen (figure 2.28),summarizes some of the installation
settings and parameters configured during the previous stages.If the
express mode installation was chosen,then it summarizes the predefined
© Bright Computing,Inc.
26 Installing Bright Cluster Manager
settings and parameters.Changes to the values on this screen are made
by navigating to previous screens and correcting the values there.
When the summary screen displays the right values,clicking on the
Start button leads to the “Installation Progress” screen,de-
scribed next.
Figure 2.28:Summary of Installation Settings
2.3.20 Installation
The “Installation Progress” screen (figure 2.29) shows the
progress of the installation.It is not possible to navigate back to previous
screens once the installation has begun.When the installation is complete
(figure 2.30),the installation log can be viewed in detail by clicking on
“Install Log”.
The Reboot button restarts the machine.The BIOS boot order may
need changing or the DVD should be removed,in order to boot fromthe
hard drive on which Bright Cluster Manager has been installed.
© Bright Computing,Inc.
2.3 Head Node Installation:Bare Metal Method 27
Figure 2.29:Installation Progress
Figure 2.30:Installation Completed
After rebooting,the systemstarts and presents a login prompt.After
logging in as root using the password that was set during the installa-
tion procedure,the system is ready to be configured.If express installa-
tion mode was chosen earlier as the install method,then the password is
preset to system.
The administrator with no interest in the add-on method of installa-
tion can skip on to Chapter 3,where some of the tools and concepts that
play a central role in Bright Cluster Manager are introduced.Chapter 4
© Bright Computing,Inc.
28 Installing Bright Cluster Manager
then explains howto configure and further set up the cluster.
2.4 Head Node Installation:Add-On Method
An add-on installation,in contrast to the bare metal installation (sec-
tion 2.3),is an installation that is done onto a machine that is already
running one of the supported distributions of section 1.1.The installation
of the distribution can therefore be skipped for this case.However,unlike
the bare metal installation,the add-on is not recommended for inexperi-
enced cluster administrators.This is because after an add-on installation
has been done to the head node,a software image for the regular nodes
must still be installed into a directory on the head node.The software im-
age is what is provisioned to regular nodes when they are powered up.
The creation and installation of a software image requires some under-
standing of the Linux operating system,and is described in section 10.6.
2.4.1 Prerequisites
For the add-on method
• The operating systemmust obviously followsystemadministration
best practices so that it works properly with the official distribution,
when Bright Cluster Manager is added on
• The items of software that Bright Cluster Manager adds must be
allowed to overrule in any conflict with what is already installed,or
the end result of the installation cannot be supported.
• Aproduct key is needed
• There must be repository access to the supported distribution.
– Internet access makes up-to-date repository access possible.
RHEL and SLES repository access requires a subscription from
Red Hat or SUSE (Appendix M).
– For high-security environments without internet access,an al-
ternative is to mount a DVD device or ISO image to the head
node containing a local repository snapshot of the parent dis-
tribution,andspecify the repository location when running the
installation command.Configuring a local repository is de-
scribed in section 10.6.3.
2.4.2 Installing The Installer
To carry out an add-on installation,the bright-installer-6.1 pack-
age must be installed with a package installer.
The bright-installer-6.1 package can be ob-
tained from a Bright Cluster Manager installation DVD,in
the directory/addon/.The file name is something like
bright-installer-6.1-129_cmbright.noarch.rpm (the ex-
act version number may differ).
After obtaining the package,it is installed as root on the node that is
to become the head node,as follows:
[root@rhel6 ~]#rpm -ivh bright-installer-bright-129_cmbright.n
noarch.rpm
© Bright Computing,Inc.
2.4 Head Node Installation:Add-On Method 29
Because the installation of the package is done using rpm directly,and
is not using a dependency resolver such as YUM,some packages may still
need to be installed first.The administrator is prompted by the installer
to install these packages,and they can be installed with YUM as usual.
Installation progress is logged in/var/log/install-bright.log.
2.4.3 Running The Installer
The Help Text For install-bright
The installer is run with the command install-bright.Running it
without options displays the following help text:
[root@rhel6 ~]#install-bright
USAGE:install-bright <OPTIONS1> [OPTIONS2]
OPTIONS1:
---------
-d | --fromdvd Path to DVD device
-i | --fromiso Path to ISO image
-n | --network Install over network
-l | --localrepo Install using local repository
OPTIONS2:
---------
-h | --help Print this help
-c | --useconfig Use predefined config file
-r | --repodir Create repository in specified directory
-o | --overwrite Overwrite existing repository
Usage Examples For install-bright
• Install Bright Cluster Manager directly fromthe Bright Computing
repositories over the internet:
install-bright -n
• Install Bright Cluster Manager using a Bright Computing DVD as
the package repository:
install-bright -d/dev/sr0
• Install Bright Cluster Manager using a Bright Computing ISOas the
package repository:
install-bright -i/tmp/bright-centos5.iso
• Install Bright Cluster Manager using a Bright Computing ISO (-i
option) as the package repository,and create the repository (-r op-
tion) in the specified directory:
install-bright -i/tmp/bright-centos5.iso -r/tmp/repodir
• Install Bright Cluster Manager using a Bright Computing ISO as
the package repository (-i option),and create the repository in the
specified directory (-r option),but overwrite contents (-o option),
if the directory already exists:
© Bright Computing,Inc.
30 Installing Bright Cluster Manager
install-bright -i/tmp/bright-centos5.iso -r/tmp/repodir -o
• Install Bright Cluster Manager from a local repository which has
already been configured.This also assumes that the repo configu-
ration files for zypper/YUMuse are already in place:
install-bright -l
• Install Bright Cluster Manager from a local repository (-l option)
which has already been configured,and specify a path to the repos-
itory directory with the -r option.This can be used when the
repository configuration file has not yet been generated and cre-
ated.A repository configuration file is generated,and placed per-
manently in the appropriate directories (/etc/yum.repos.d/or
/etc/zypp/repos.d/):
install-bright -l -r/tmp/repodir
An Installation Run For install-bright
The most common installation option is with an internet connection.Any
required software packages are asked for at the start:
Example
[root@rhel6 ~]#install-bright -n
Please install the follow pre-requisites
----------------------------------------
createrepo
[root@rhel6 ~]#yum install createrepo
...
After all the packages are installed on the head node,the installer can be
run again.It checks for some software conflicts,and warns about the ones
it runs into.:
Example
[root@rhel6 ~]#install-bright -n
INFO/ERROR/WARNING:
--------------------
WARNING:
A DHCP daemon is already running.Bright Cluster Manager
provides a customized DHCP server,and will update the
’dhcpd’ configuration files.It is highly recommended
that you stop your existing DHCP server,and let Bright
Cluster Manager configure your dhcp server.
You can also choose to ignore this message,and proceed
with the existing DHCP server,which may or may not work.
--------------------
Continue(c)/Exit(e)?e
[root@rhel6 ~]#/etc/init.d/dhcpd stop
Shutting down dhcpd:[ OK ]
© Bright Computing,Inc.
2.4 Head Node Installation:Add-On Method 31
Having resolved potential software conflicts,the product key (sup-
plied by Bright Computing or its vendor) is supplied:
Example
[root@rhel6 ~]#install-bright -n
Bright Cluster Manager Product Key Activation
---------------------------------------------
Product key [XXXXX-XXXXX-XXXXX-XXXXX-XXXXX]:001323-134122-134134-n
314384-987986
...
License Parameters
------------------
Country Name (2 letter code) []:US
State or Province Name (full name) []:CA
Locality (city) []:San Francisco
Organization Name (e.g.company) []:Bright
Organization Unit (e.g.department) []:Development
Cluster Name []:bright61
MAC address [??:??:??:??:??:??]:08:B8:BD:7F:59:4B
Submit certificate request to Bright Computing?[y(yes)/n(no)]:y
Contacting license server...License granted.
License has been installed in/cm/local/apps/cmd/etc/
The software license is displayed,and can be clicked through.Some
warning is given about the configuration changes about to take place:
Please be aware that the Bright Cluster Manager will re-write the
following configuration on your system:
- Update network configuration files.
- Start a DHCP server on the management network.
- Update syslog configuration
The software configuration sections is reached.Default Bright Cluster
Manager values are provided,but should normally be changed to appro-
priate values for the cluster.Questions asked are:
Management network parameters
-----------------------------
Network Name [internalnet]:
Base Address [10.141.0.0]:
Netmask Bits [16]:
Domain Name [eth.cluster]:
Management interface parameters
-------------------------------
Interface Name [eth0]:
IP Address [10.141.255.254]:
External network parameters
---------------------------
Network Name [externalnet]:
Base Address [DHCP]:
Netmask Bits [24]:
© Bright Computing,Inc.
32 Installing Bright Cluster Manager
Domain Name []:cm.cluster
External interface parameters
-----------------------------
Interface Name [eth1]:
IP Address [DHCP]:
External name servers list (space separated)
--------------------------------------------
List [10.150.255.254]:
Root password
-------------
Please enter the cluster root password:
MySQL root password
-------------------
Please enter the MySQL root password:
The Bright Cluster Manager packages are then installed and configured.
The stages include,towards the end:
Example
Setting up repositories.....[ OK ]
Installing required packages.....[ OK ]
Updating database authentication.....[ OK ]
Setting up MySQL database.....[ OK ]
Starting syslog.....[ OK ]
Initializing cluster management daemon.....[ OK ]
Generating admin certificates.....[ OK ]
Starting cluster management daemon.....[ OK ]
If all is well,a congratulatory message then shows up,informing the
administrator that Bright Cluster Manager has been installedsuccessfully,
that the host is nowa head node.
Installing The Software Image For Regular Nodes After The
install-bright Installation Run
Afunctional cluster needs regular nodes to work with the head node.
The regular nodes at this point of the installation still need to be set up.
To do that,a software image (section 3.1.2) must now be created for the
regular nodes on the head node.The regular nodes,when booting,use
such a software image when they boot up to become a part of the clus-
ter.A software image can be created using the base tar image included
on the DVD,or as a custom image.The details on how to do this with
cm-create-image are given in section 10.6.
Once the head node and software image have been built,the head
node installation is complete,and the cluster is essentially at the same
stage as that at the end of section 2.3.20 of the bare metal installation,
except for that the software image is possibly a more customized image
than the default image provided with the bare-metal installation.
© Bright Computing,Inc.
2.4 Head Node Installation:Add-On Method 33
The Build Configuration Files
This section is mainly intended for administrators who would like to de-
ploy installations that are pre-configured.It can therefore be skipped in a
first reading.
The build configuration file of a cluster contains the configuration
scheme for a cluster.The bare metal and add-on installations both gener-
ate their own,separate build configuration files,stored in separate loca-
tions.
Most administrators do not deal with a build configuration file di-
rectly,partly because a needto do this arises only in rare andspecial cases,
and partly because it is easy to make mistakes.An overview,omitting de-
tails,is given here to indicate how the build configuration file relates to
the installations carried out in this chapter and howit may be used.
The bare metal build configuration file:The file at:
/root/cm/build-config.xml
on the head node contains cluster configuration settings and the list of
distribution packages that are installed during the bare metal installation.
Once the installation has completed,this file is static,and does not change
as the cluster configuration changes.
The add-on installation build configuration file:Similarly,the file at:
/root/.bright/build-config.xml
contains configuration settings.However,it does not contain a list of dis-
tribution packages.The file is created during the add-on installation,and
if the installation is interrupted,the installation can be resumed at the
point of the last confirmation prior to the interruption.This is done by
using the -c option to install-bright as follows:
Example
install-bright -c/root/.bright/build-config.xml
Both original “build” configuration XML files can be copied and in-
stalled via the -i|import option:For example:
service cmd stop
cmd -i build-config-copy.xml#reinitializes CMDaemon from scratch
service cmd start
overwrites the old configuration.It means that the new cluster presents
the same cluster manager configuration as the old one did initially.This
can only be expected to work with identical hardware because of hard-
ware dependency issues.
An XML configuration file can be exported via the -x option to cmd:
For example:
service cmd stop
cmd -x myconfig.xml
service cmd start
© Bright Computing,Inc.
34 Installing Bright Cluster Manager
Exporting the configuration is sometimes helpful in examining the
XML configuration of an existing cluster after configuration changes have
been made to the original installation.This “snapshot” can then,for ex-
ample,be used to customize a build-config.xml file in order to de-
ploy a customversion of Bright Cluster Manager.
An exported configuration cannot replace the original bare-metal
build-config.xml during the installation procedure.For example,if
the original bare-metal file is replaced by the exported version by open-
ing up another console with alt-f2,before the point where the “Start”
button is clicked (figure 2.28),then the installation will fail.This is be-
cause the replacement does not contain the list of packages to be installed.
The exported configuration can however be used after a distribution
is already installed.This is true for a head node that has been installed
from bare-metal,and is also true for a head node that has undergone or
is about to undergo an add-on installation.This is because a head node
after a bare-metal installation,and also a head node that is ready for an
add-on installation (or already has had an add-on installation done) do
not rely on a packages list in the XML file.