If you find that you are unable to complete the D2iQ tutorial TensorFlow example or any of your own TensorFlow notebook steps, check if the error below is produced:
Kernel Restarting
The kernel for <notebook-name-here> with TensorFlow.ipynb appears to have died. It will restart automatically.
This message does not offer any explanation for why the Kernel has died, but there is an easy way to check if the issue is the CPU of the host the Notebook Pod is running on. Open a new terminal window inside the notebook and type:
lscpu
Check that in the output of the command, it shows that the CPU instruction set AVX is enabled. If AVX is not enabled, first check that your CPU supports it. Lets say you have a cpu listed such as:
Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
And flags such as:
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes lahf_lm tsc_adjust arat
We do not see the AVX extension listed, so lets check the official Intel spec sheet for this CPU:
https://ark.intel.com/content/www/us/en/ark/products/75790/intel-xeon-processor-e52630-v2-15m-cache-2-60-ghz.html
We can see that this CPU does in fact support AVX:
Instruction Set Extensions IntelĀ® AVX
This indicates that either the CPU is defective, misconfigured, or the hypervisor has disabled this extension. You will have to rectify this before TensorFlow notebooks will run properly on this VM. A common cause of a CPU instruction set such as AVX being disabled is that the server is part of a cluster of physical machines in a virtual machine cluster, and the effective CPU generation is set lower cluster wide for compatibility between hosts. You should reach out to your hypervisor vendor if you are unsure whether or not this is the case.
After rectifying your issues, you can check via lscpu again and you should see the AVX flag in the output:
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm fsgsbase tsc_adjust smep arat