Ironic is a Nova driver that allows you to deploy instances to bare metal servers without a hypervisor. Operating Ironic means you have all the normal operational issues of a normal OpenStack cloud, plus the issues of constantly dealing with hardware.
Josh is an engineer on Rackspace OnMetal, a multitenant Ironic cloud. He has been developing and operating OnMetal for about a year. All OnMetal engineers are in a week long on-call rotation, so we operate the software we help write. During an on-call shift, engineers are expected to improve operability of OnMetal and Ironic for that entire week, rather than adding new features. Since we operate the code, we feel the same pain that many operators feel, and attempt to write code that avoids operator frustration. If our code fails at 2am, we're the ones getting woken up. In the course of operating Ironic for the past year, we’ve encountered many pain points and developed fixes for most of them.
In this talk, I’ll:
Demo and discuss our scripts and automation for some of our pain points with Ironic
Demo the dashboard we developed to visualize our Ironic cloud and discuss the trends in errors it showed
Detail how we use the Ironic Python Agent to improve operability and how you can avoid some of the gotchas when deploying Ironic with IPA
Discuss some of the scaling issues we’ve encountered and how to avoid them
- Discuss the changes we've submitted to Ironic to improve managability
- Discuss the ongoing pain points and how to avoid them