NSX Backups failing “HTTP request time out”

“Server failed to respond. HTTP request time out.”

I’ve been seeing this more frequently and thought I’d post something about how to fix this error that may happen when you click on the “Backup & Restore” button on the home page of NSX. It can affect the list of backups from loading and the backup jobs from running altogether. As you can see below NSX failed to load all of the backups in the history:

nsx-backup-issues-1.png

I have seen in the past a limit of 100 backups listed in history before performance of NSX listing those can be affected. I will try to link official documentation as soon as I find it.

I went and checked how many backups I had and found….. 830!! Just slightly over the (assumed) recommended amount of 100…

[root@ifitisnotbroken-file1 NSX1]# ls -lath | wc -l
830
[root@ifitisnotbroken-file1 NSX1]#

To clean this up, I ran the following command which was take from here:

find /ifitisnotbrokenbackups/NSX/NSX1-* -mtime +20 -exec rm {} \;

It finds any file that is over 20 days old(the +20 part) and will remove them. After I ran that command I had 82 files left:

[root@ifitisnotbroken-file1 NSX1]# ls -lath | wc -l
82
[root@ifitisnotbroken-file1 NSX1]#

I wanted to lower the number of files/backups so I didn’t have to do the cleanup process as often. I ran the above command but changed the days to 10 and I was left with 43 files. I refreshed the backup page in NSX and it took less than 17 seconds to load and my backups started working without issue:

nsx-backup-issues-2

Hope this helps and please let me know if you have any questions!

 

Fenced vApps are broken in vCloud Director 8.x with NSX 6.3.x

 

Updated with a fix!

 

I’m a huge fan of vCloud Director and the amazing product that NSX is; that being said this is not a knock against vCD or NSX but rather more of a “PSA”. Using vCloud Director and NSX is fantastic aside from this little bug/known issue. If you use fencing in your vCD 8.x environment and NSX 6.2.x do not upgrade just yet. Please see the notes here in a VMware KB about why upgrading to NSX 6.3.x in a vCD 8.x environment will break fenced vApps.

Here is a snip from the KB itself

In NSX 6.3.x Edge Gateways, a new route table is introduced (table 251) which is always looked up first for routes. The main table (table 254) is looked up only when the table 251 does not have a route. The issue occurs because the device routes (a /32 route for vApp VM) are auto-plumbed to the main table, whereas table 251 already has a default route in it. Therefore, since the table 251 already has a default route, the lookup never happens on the main table and hence the fenced vApp virtual machines lose connectivity.

VMware is still working on a fix internally and I’ll try to update this blog once I know more about the fix and when it is released.

 

***UPDATE***

 

Please see the following release doc stating the Fencing issue is fixed in 6.3.2

http://pubs.vmware.com/Release_Notes/en/nsx/6.3.2/releasenotes_nsx_vsphere_632.html

Enjoy!

Why get VMware Certified and keep it current?

Today I decided to take a step back and think why I got certified in the first place(way back in 2009). People have very different motivations for getting certified, mine was not related to monetary gain or advancement at my workplace when I first (started using haha – they are a bit like an addiction) took a VMware exam.  I worked for a very small company and I wanted a way to prove to myself that I knew what I was actually doing; also prove to my employer that I knew what I was doing!

Enter the VCP-4 exam, I felt this was the perfect way to show that I did know all of the features, such as how to configure ESX(i)/vCenter, which license level gave you which specific features and most importantly what all of those cool features did/could do.  Working for a small company I did not have the luxury of getting exams paid for sadly, so when I decided to take my exam I needed to make sure I was ready, had put the time in to study and find every resource possible to learn all that I could.

My first attempt I was very nervous and unsure even though I had spent 2 months going over things I thought I would need to know. I failed, I not only failed I bombed it. I did not let it get my down for long and after that I decided to start fresh and go over this “blueprint” thing that I had downloaded but did not pay (enough)attention to. I began reviewing it and looking over various breakdowns of the blueprint on blogs like this one from Simon Long. I took another 3 months to study, really take time to go over the blueprint, and feel comfortable with all topics. During that 3 months I spent time reading over VMware documentation that was relevant to the blueprint and learning everything I could.

I lined up another exam date and was ready to give it a go after months of study time. I was still nervous but felt much better about taking it this time. I had gained more confidence after putting in the study time and felt like I could really do this(now that did not mean that I didn’t get a little flustered whilst taking the exam). I passed this time with flying colors. I felt on top of the world after that.

Moving on from the first pass:

I really started reading more and more after passing that first exam. I setup a home lab and found many answers to my questions on the VMware community page, and on Twitter. The community that was around at that time was already really amazing as people were so helpful and eager to answer questions.

Did this actually help you or your career and why keep taking them?

So, why take these exams and put all of this time in? Well, again I started this to justify my knowledge to myself really. I kept taking these over the years as the versions changed and new exams came out; I wanted to push myself to learn more/new technologies and show it by passing these exams. By passing these exams I got an opportunity to interview for an amazing position. Part of why I got the position is I took the time to pursue these certs on my own and kept learning. It made a huge difference in my career passing these certs and moving into a role where I have had and still have tons of room to grow.

Looking back at this first exam I really learned what VMware was expecting and that you can’t ignore the blueprint!! The issues I have/had thinking about this one is that you had to memorize so many config min/max settings that many people found to be useless. Moving on from version 4 I feel that VMware has correct this issue and is testing for much more “real life” applicable knowledge from people attempting these tests. I also felt that it was a huge benefit to setup a home lab going forward with my career and future exams. Having taken many more exams since the VCP4 I really have a good process for prepping for the exam which includes reviewing the full blueprint. A fantastic example of covering the blueprint is what Mike Preston did on his blog  when he covered his 8 weeks of VCAP which I used for my first VCAP!

 

Never stop learning!

 

Photon OS vCenter 6.5 deleting EAM folders in /tmp

To anyone that runs into an issue where hosts fail to get prepared with VXLAN by NSX, hopefully this post will help you out. This specific issue a very wise colleague(Mr.Sage) found, is when EAM(ESX Agent Manager) folders get deleted within the /tmp directory in the Photon OS 6.5 vCenter and that causes your hosts to not get prepared by NSX with VXLAN until a workaround is put in place or your restart EAM.

The good news is that there is a workaround(Please note I’m not expert on this and implementing this is done at your own risk

  1. First, as noted above you can simply restart EAM. Seems easy enough but how often do you reboot a host and how often do you really want to restart EAM?
    1. If you do want to restart EAM you can simply use the following command to check the status/stop/start vmware-eam

# Use this to check the status of EAM and simply change the "--status" to "--start" or "--stop"

service-control --status vmware-eam

  1. The other work around is to create a new file under this directory: /usr/lib/tmpfiles.d named tmp-eam.conf with the following contents:

# Exclude the following for EAM service
x /tmp/eam*

The above would allow the eam files to stay around until the system is rebooted. Once the vCenter is rebooted EAM would be restarted anyway and the files would be recreated.

Hope this helps and ping me with any feedback or questions

NSX API tips and guides

Lately I’ve been working more with the API for multiple versions of NSX with my colleague @VirtSouthWest. Here are a couple of of API calls that we have been using which are something I’d like to keep track of for future configurations and hope they help someone else:

To get started you need a REST API Client/plugin, here is one I use that works with FireFox – RESTCLIENT

Once you have that installed you are ready connect to your NSX manager. If you have a self signed cert you may need to go to the NSX Manager and accept the “not secure connection”. That is something good to check if you get a response like the one below:

api-auth-fail

Once you accept the security warning you and ensure you have the correct Authorization and Header in place you should get a 200 OK response as shown below:

api-auth-200ok

Here is a sample configuration of what you would send to a NSX manager API to configure Syslog. Make sure to specify the protocol TCP/UDP) and which port you have your syslog configured on, the standard being 514.

<syslogserver>
<syslogServer>Syslog-Server-FQDN/IP</syslogServer>
<port>514</port> - Port Configured on your Syslog Server
<protocol>UDP</protocol> - TCP/UDP
</syslogserver>

Here is a sample configuration of what you would send to a NSX manager API to configure NTP, you can configure 2 NTP servers using IP or FQDN which is great for redundancy.


<timeSettings>
<ntpServer>
<string>NTPServer-IP1</string><string>time1.google.com</string> - You can configure 2 NTP Servers
</ntpServer>
<timezone>UTC</timezone>
</timeSettings>

From the limited experince I have the backups are small ranging from 10-40MB.

Please note that once they reach their destination you configure they stay there and NSX does not currently clean up the backups. Meaning if you configure a backup job to run daily, after 1 year you will have 365 backups. This can take a while to load on the backup/restore screen. Please configure a job on the destination end to cleanup the backup jobs as needed. NSX will reflect these backups being gone and the list will be come shorter/load faster.

Here is a sample configuration of what you would send to a NSX manager API to configure scheduled Backups. In the example I have the time scheduled for 19:50 and for each manager you can configure the backup time. I have mine set to be staggered every 5 minutes.

Replace the following fields(Snip from the API Guide 6.2 below):

transferProtocol: FTP, SFTP

frequency: weekly, daily, hourly

dayOfWeek: SUNDAY, MONDAY, …., SATURDAY(Not in my example below)

Hour of Day: [0 ‐ 24 [  Minute of hour: [0 ‐ 60 [

Exclude Tables: AUDIT_LOG, SYSTEM_EVENTS, FLOW_RECORDS

The tables specified in the excludeTables parameter are not backed up.

<backupRestoreSettings>
<ftpSettings>
<transferProtocol>FTP</transferProtocol>
<hostNameIPAddress>Backup-Destination/IP-Address</hostNameIPAddress>
<port>21</port>
<userName>FTPUSER</userName><password>Password-for-FTPUSER</password>
<passPhrase>passPhrase</passPhrase> - For the backup file to restore
<backupDirectory>NSXBackupDir/</backupDirectory>
<filenamePrefix>NSX-Manager1-</filenamePrefix>
<passiveMode>true</passiveMode>
<useEPRT>false</useEPRT>
<useEPSV>true</useEPSV>
</ftpSettings>
<backupFrequency>
<frequency>DAILY</frequency>
<hourOfDay>19</hourOfDay>
<minuteOfHour>50</minuteOfHour>
</backupFrequency>
<excludeTables>
<excludeTable>AUDIT_LOGS</excludeTable>
<excludeTable>SYSTEM_EVENTS</excludeTable>
</excludeTables>
</backupRestoreSettings>

There are many of other things you can do via the NSX API and the above are just some some calls to get started. You can create controllers, controller backups, edges, etc.. Please review the guides below for the version you have.

API Guides link for different versions:

NSX 6.0.4 Guide

NSX 6.2 Guide

Certification upgrade paths

So I spend a good amount of time(read far too much time) on the VMware Education blog site, it is a great place to get the current information on new courses, videos, free labs and certification news. I was looking over some older posts before taking my VCAP6-DCV Deployment exam and found this post. It is about getting your DCV certifications upgraded to version 6 and the new VCIX.

After reading the above I think it is interesting that, if I am reading this post correctly and I like to think I am, you can take either the Deploy or Design exam to get upgrade your version 5 VCAP-DCV certs depending on which v5 exams you’ve passed. To quote the page:

“To upgrade from a VCAP5, complete the alternate VCAP6 certification. For instance, a VCAP5-DCA plus a VCAP6-DCV Design would earn you the VCIX6 designation.”

So having both the VCAP5-DCA + DCD and having passed the VCAP6-DCV Deployment exam I should get an upgrade to the VCIX shortly. I’ll post back my experience on how long this takes.

 

**********************

Update: I opened a case and have received a reply stating I also need the VCP6 passed for the upgrade to happen. I have yet to see that via a public doc and per the attached screenshot I do not feel this is the case. More to come:

vcap-upgrade

**********************

Update: After providing more info and documentation is was decided that you do not need the VCP6 in order to get the VCIX6-DCV. I have been informed this will reflect in my transcript in the near future! Hopefully this does not happen to anyone else going forward.

VCAP6-DCV Deployment passed!

vmw-lgo-cert-adv-pro-6-data-ctr-virt-deploy-k

A few days ago I sat and passed the fully released version of the VCAP6-DCV Deploy exam! I failed the Beta version of this exam a few months ago but even then I felt this was a good exam that covered great topics. Having the VCAP5-DCA/DCD passed so this should upgrade my certs to the VCIX6-DCV.

Here is a quick breakdown of resources I used, my experience and notes I can share that are important before sitting this exam:

Review the new platform interface “disclaimer”! There are some tricky limits that you need to be aware of before sitting this. At the time I am typing this Control, Alt, Backspace do not work. This means that if you miss type something and instinctively hit backspace to correct your error you will not be able to. If you want to use that nifty ctrl + c and then ctrl + v to say… copy and paste something that will not work either. Hopefully this is changed in the future. Also check your screen resolution, I overlooked this for 2 hours and had a terrible time with scaling.

Resources used:

Much like other exams the breakdown/blue print is key. I go over the blueprint for each exam since they can literally touch on any topic listed. A great breakdown comes from Kyle Jenner’s study guide which can be found at vJenner.com and you can find him on twitter @kylejenneruk. Another great resource is Pluralsight’s video training. There are slack and google study groups that you can join as well.  Building a home lab for this or any exam I think is helpful. Having said that I’m lucky enough to work with many different VMware products in my current role that are covered in this exam. There are also Hand on Labs that you can use if you do not have a home lab.

Exam experience:

My experience was really good from a performance side of things. The new platform works well aside from the above noted issues. I wasted so much time since I forgot about changing the resolution. Be careful on your time management, I spent a good amount of time on the first few questions without realizing that an hour had passed. Time management is key as I ended up having 8 questions left with about 35 minutes remaining. Keep an eye on that clock! Do not give up even if you are short on time see what you can do! I felt like the topics covered were fair and things a well rounded VI admin should know or at least tinkered with to keep current.

Hopefully this is helpful. Please reach out to me on twitter/slack/linkedin/email if you have questions.