The Truth,the whole truth…
So here are the errors that I encountered during this installation.
Error – 1
When running the configuration processor it will accept a short encryption key and doesn’t warn or error until the end of the run through.
I rerun the configuration processor and use a stronger encryption password ‘H3lionhelion!’
And so we move on to the next configuration that needs to be debugged…
Error – 2
If we look at the network_groups.yml file I can see I forgot to modify this section
I added ‘hos2.allthingscloud.eu’ as the external name
Error – 3
It’s also complaining about my nic_mapping profiles assigned in the servers.yml file.
This profile was missing from the nic_mappings file.
I added the correct profile, HP-DL360-8PORT, to the nic_mappings file.
Now we need to recommit all these changes to the repository and start again.
Error – 4
I have the following ERROR when trying to wipe the disks
This is because I encrypted the sensitive content – to overcome this I need to supply the –ask-vault-pass as part of the command line
Error – 5
And just when I think I’ve almost finished I get another failure as follows:
Lots of complaints in ~/.ansible/ansible.log about corrupt disk partitions – I attempted to “manually” clear these using the following script:
This also made no difference. I get the same partition error.
My next attempt at a fix will be to log on to the raid controller on each ceph server and delete and re-create the non OS drives.
Now I’ll delete all Arrays EXCEPT Array A – the OS drive
Repeat this for Arrays B – G you should end up with something like this:
Now rebuild all the raid0 drive arrays
Select the “Create Arrays with RAID 0” option
Repeat this process on the other 2 Ceph nodes and then we can relaunch the deployment again
Once again we have the exact same failure – time to look for known bugs…
Yes – this is a known bug – the wipedisk functionality does not always work correctly –
It’s necessary to log on to each node and run the following command against each journal and osd drive:
or use the following script
Error – 6
And now we continue on where we left off –
This brings me to the next challenge –
As you can see this is complaining about authentication. What you can’t see is that over 12 hours have passed since I re-joined the original failed screen session. By appending the –limit @/home/graham/site.retry it attempts to carry on from where it failed. However, it looks as though some authentication tokens may have subsequently expired.
Re-launch the installation without the ” –limit @/home/graham/site.retry” option.