The Truth,the whole truth…
So here are the errors that I encountered during this installation.
Error – 1
When running the configuration processor it will accept a short encryption key and doesn’t warn or error until the end of the run through.
I rerun the configuration processor and use a stronger encryption password ‘H3lionhelion!’
And so we move on to the next configuration that needs to be debugged…
Error – 2
cd ~/helion/hos/ansible
ansible-playbook -i hosts/localhost config-processor-run.yml
If we look at the network_groups.yml file I can see I forgot to modify this section
I added ‘hos2.allthingscloud.eu’ as the external name
Error – 3
It’s also complaining about my nic_mapping profiles assigned in the servers.yml file.
This profile was missing from the nic_mappings file.
I added the correct profile, HP-DL360-8PORT, to the nic_mappings file.
Now we need to recommit all these changes to the repository and start again.
cd ~/helion/hos/ansible git add -A git commit -m "Fixed initial configuration errors"
Error – 4
cd ~/scratch/ansible/next/hos/ansible
ansible-playbook -i hosts/verb_hosts wipe_disks.yml
I have the following ERROR when trying to wipe the disks
This is because I encrypted the sensitive content – to overcome this I need to supply the –ask-vault-pass as part of the command line
ansible-playbook -i hosts/verb_hosts wipe_disks.yml --ask-vault-pass
Error – 5
And just when I think I’ve almost finished I get another failure as follows:
Lots of complaints in ~/.ansible/ansible.log about corrupt disk partitions – I attempted to “manually” clear these using the following script:
clear_host() { ssh $1 << EOF echo onhost connected sudo /bin/dd if=/dev/zero of=/dev/sdb bs=512 count=2 sudo /bin/dd if=/dev/zero of=/dev/sdc bs=512 count=2 sudo /bin/dd if=/dev/zero of=/dev/sdd bs=512 count=2 sudo /bin/dd if=/dev/zero of=/dev/sde bs=512 count=2 sudo /bin/dd if=/dev/zero of=/dev/sdf bs=512 count=2 sudo /bin/dd if=/dev/zero of=/dev/sdg bs=512 count=2 sudo /bin/dd if=/dev/zero of=/dev/sdh bs=512 count=2 sync EOF } export -f clear_host seq 15 17 | while read i; do clear_host 172.16.60.$i done
This also made no difference. I get the same partition error.
My next attempt at a fix will be to log on to the raid controller on each ceph server and delete and re-create the non OS drives.
Now I’ll delete all Arrays EXCEPT Array A – the OS drive
Repeat this for Arrays B – G you should end up with something like this:
Now rebuild all the raid0 drive arrays
Select the “Create Arrays with RAID 0” option
Select OK
Repeat this process on the other 2 Ceph nodes and then we can relaunch the deployment again
ansible-playbook -i hosts/verb_hosts site.yml --ask-vault-pass --limit @/home/graham/site.retry
Once again we have the exact same failure – time to look for known bugs…
Yes – this is a known bug – the wipedisk functionality does not always work correctly –
It’s necessary to log on to each node and run the following command against each journal and osd drive:
/sbin/sgdisk --zap-all -- /dev/sd[b-h]
or use the following script
clear_host() { ssh $1 << EOF echo onhost connected sudo /sbin/sgdisk --zap-all -- /dev/sdb sudo /sbin/sgdisk --zap-all -- /dev/sdc sudo /sbin/sgdisk --zap-all -- /dev/sdd sudo /sbin/sgdisk --zap-all -- /dev/sde sudo /sbin/sgdisk --zap-all -- /dev/sdf sudo /sbin/sgdisk --zap-all -- /dev/sdg sudo /sbin/sgdisk --zap-all -- /dev/sdh sync EOF } export -f clear_host seq 15 17 | while read i; do clear_host 172.16.60.$i done
Error – 6
And now we continue on where we left off –
ansible-playbook -i hosts/verb_hosts site.yml --ask-vault-pass --limit @/home/graham/site.retry
This brings me to the next challenge –
As you can see this is complaining about authentication. What you can’t see is that over 12 hours have passed since I re-joined the original failed screen session. By appending the –limit @/home/graham/site.retry it attempts to carry on from where it failed. However, it looks as though some authentication tokens may have subsequently expired.
Re-launch the installation without the ” –limit @/home/graham/site.retry” option.
ansible-playbook -i hosts/verb_hosts site.yml --ask-vault-pass
One thought on “HOS 2.1 Ceph Installation with Network Customisation (8-of-8)”