Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ansible pull fails on compute node - work tree dir File exists #76

Open
christopheredsall opened this issue Jun 13, 2020 · 3 comments
Open

Comments

@christopheredsall
Copy link
Contributor

On a newly built cluster using clusterinthecloud/terraform@e313404 with the default "4" branch of ACRC/slurm-ansible-playbook

Submitting a job to start the node results in the following /root/ansible-pull.log

Starting Ansible Pull at 2020-06-13 16:27:40
/usr/bin/ansible-pull --url=https://github.com/ACRC/slurm-ansible-playbook.git --checkout=4 --inventory=/root/hosts compute.yml
 [WARNING]: Platform linux on host vm-gpu3-2-ad2-0001 is using the discovered
Python interpreter at /usr/bin/python, but future installation of another
Python interpreter could change this. See https://docs.ansible.com/ansible/2.8/
reference_appendices/interpreter_discovery.html for more information.
vm-gpu3-2-ad2-0001 | FAILED! => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    }, 
    "changed": false, 
    "cmd": "/usr/bin/git clone --origin origin https://github.com/ACRC/slurm-ansible-playbook.git /root/.ansible/pull/vm-gpu3-2-ad2-0001.subnet.clustervcn.oraclevcn.com", 
    "msg": "fatal: could not create work tree dir '/root/.ansible/pull/vm-gpu3-2-ad2-0001.subnet.clustervcn.oraclevcn.com'.: File exists", 
    "rc": 128, 
    "stderr": "fatal: could not create work tree dir '/root/.ansible/pull/vm-gpu3-2-ad2-0001.subnet.clustervcn.oraclevcn.com'.: File exists\n", 
    "stderr_lines": [
        "fatal: could not create work tree dir '/root/.ansible/pull/vm-gpu3-2-ad2-0001.subnet.clustervcn.oraclevcn.com'.: File exists"
    ], 
    "stdout": "", 
    "stdout_lines": []
}
 [WARNING]: Your git version is too old to fully support the depth argument.
Falling back to full checkouts.
 [WARNING]: Platform linux on host vm-
gpu3-2-ad2-0001.subnet.clustervcn.oraclevcn.com is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change this. See https://docs.ansible.com/ansible/2.8/referen
ce_appendices/interpreter_discovery.html for more information.
vm-gpu3-2-ad2-0001.subnet.clustervcn.oraclevcn.com | CHANGED => {
    "after": "fe5fc5bb46fec69c6db9465782793f312134a3f5", 
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    }, 
    "before": null, 
    "changed": true
}
@christopheredsall
Copy link
Contributor Author

christopheredsall commented Jun 13, 2020

Indeed the directory exists

[root@vm-gpu3-2-ad2-0001 ~]# ls -ld /root/.ansible/pull/vm-gpu3-2-ad2-0001.subnet.clustervcn.oraclevcn.com
drwxr-xr-x. 6 root root 4096 Jun 13 16:27 /root/.ansible/pull/vm-gpu3-2-ad2-0001.subnet.clustervcn.oraclevcn.com

Moving it aside and re-pulling

[root@vm-gpu3-2-ad2-0001 ~]# mv /root/.ansible/pull/vm-gpu3-2-ad2-0001.subnet.clustervcn.oraclevcn.com /root/.ansible/pull/BROKEN-vm-gpu3-2-ad2-0001.subnet.clustervcn.oraclevcn.com
[root@vm-gpu3-2-ad2-0001 ~]# /usr/bin/ansible-pull --url=https://github.com/ACRC/slurm-ansible-playbook.git --checkout=4 --inventory=/root/hosts compute.yml

Results in exactly the same error and log output

@christopheredsall
Copy link
Contributor Author

@milliams
Copy link
Member

I've seen this before and I'm still not sure what causes it. It seems like sometimes the cloud-init script is started twice.

With the new work to pre-generate images it will be less of a problem but putting in a file lock to prevent the race condition could help too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants