Overview
In order to effectively troubleshoot DC/OS, it may be necessary to gather a set of logs from all DC/OS components. The following instructions will provide guidance for that task.How ToIn DC/OS 2.0 and later, run these commands:
dcos diagnostics create dcos diagnostics wait (This may take some time for the process to complete) dcos diagnostics list dcos diagnostics download
In DC/OS 1.13 and earlier, run these commands:
dcos node diagnostics create all dcos node diagnostics --status (wait for progress to reach 100%) dcos node diagnostics download <bundle_name>
There is a chance that "Diagnostics job failed" appears when you check the status. This is often expected and only means that one or more endpoints were not accessible. Please continue to wait until the progress reaches 100% and download the bundle regardless as it will likely still contain information and logs that are useful for troubleshooting.
Please note: In the situations a full log bundle is not required, "all" may be substituted for any of the following: IP address, hostname, Mesos ID, or the keywords "masters" or "agents".
For example, if the full bundle is too large to download in a reasonable time, you can gather just the masters and the node that’s experiencing an issue.
Please note: In the situations a full log bundle is not required, "all" may be substituted for any of the following: IP address, hostname, Mesos ID, or the keywords "masters" or "agents".
For example, if the full bundle is too large to download in a reasonable time, you can gather just the masters and the node that’s experiencing an issue.
Bundle Not Working?
While a bundle is preferable, there will be cases where it's not possible to generate a complete bundle. In those cases, you can collect the logs from the node(s) in question using the following "1-liner". Please SSH to each node directly and run the command:
d=$(date -u +%Y%m%d-%H%M%S) && tmp_dir=/tmp/dcos_diagnostics-${d} && if sudo systemctl | grep dcos | grep master > /dev/null; then node_type=master; elif sudo systemctl | grep dcos | grep public > /dev/null; then node_type=agent_public; else node_type=agent; fi; node_dir=${tmp_dir}/$(/opt/mesosphere/bin/detect_ip)_${node_type} && mkdir -p ${node_dir} && sudo dmesg -T > ${node_dir}/dmesg-0.output && for unit in $(sudo systemctl list-units --no-legend --no-pager --plain 'dcos-*' | awk '{print $1}'); do echo "Saving logs for ${unit}"; sudo journalctl -au ${unit} > ${node_dir}/${unit}; done && tar -czvf $(/opt/mesosphere/bin/detect_ip)_${node_type}-${d}.tgz -C $tmp_dir .
This will produce a file called `{node IP}-{node type}-{date}.tgz`, which you can then upload to the ticket.