VMware Cloud Director 10.0 – 10.2 Multi Site: How to remove a site

Building off of this amazing post on how to configure multi site, I was working on retiring a site in VCD 10.2.x and discovered there is no option to disconnect a site in the UI until 10.3.x. When retiring an instance for any reason and needing to disconnect the instance from a Multi Site configuration this is a process:

Authenticate to the VCD API following this KB(I always forget these steps using postman…), this should be done at the provider level and with a SYSADMIN level account: https://kb.vmware.com/s/article/56948

Once you have the bearer token from the above step, run the following command to see what site(s) the VCD instance is linked to by running this command. Changing the host name to match the instance in question

Make sure to add the following header as well

Content-Type            application/vnd.vmware.admin.siteAssociation+xml

GET https://bblab-vcd02.com/api/site/associations/

Capture the body that is returned which will list out the instance(s) this sit is connected to and their current state. The return body may be large but it is required for the next steps Expand source

Locate the line that states “remove” and add the SITE_UUID to the end of your URL similar to my example<Link rel=”remove” href=”https://bblab-vcd01.com/api/site/associations/c913fce4-e24b-4c38-83aa-3f66c8ae8ae7” type=”application/vnd.vmware.admin.siteAssociation+xml”/>Update postman URL to: https://bblab-vcd02.com/api/site/associations/c913fce4-e24b-4c38-83aa-3f66c8ae8ae7

Within the return body above, capture from <SiteAssociationMember to  —–END PUBLIC KEY—–</PublicKey> and paste that into the body as RAW XML as shown below for removing bblab-vcd01 from bblab-vcd02. 

 Now run that as a DELETE and the return should look similar to the following


<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Task xmlns="http://www.vmware.com/vcloud/v1.5" xmlns:ovf="http://schemas.dmtf.org/ovf/envelope/1" xmlns:vssd="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingData" xmlns:common="http://schemas.dmtf.org/wbem/wscim/1/common" xmlns:rasd="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData" xmlns:vmw="http://www.vmware.com/schema/ovf" xmlns:ovfenv="http://schemas.dmtf.org/ovf/environment/1" xmlns:vmext="http://www.vmware.com/vcloud/extension/v1.5" xmlns:ns9="http://www.vmware.com/vcloud/versions" cancelRequested="false" expiryTime="2021-12-23T01:14:57.381Z" operation="Updating Site bblab-vcd02.com(c913fce4-e24b-4c38-83aa-3f66c8ae8ae7)" operationName="siteUpdate" serviceNamespace="com.vmware.vcloud" startTime="2021-12-09T01:14:57.381Z" status="queued" name="task" id="urn:vcloud:task:8d84c54f-d121-4332-8de0-5700a63696cf" href="https://bblab-vcd02.com/api/task/8d84c54f-d121-4332-8de0-5700a63696cf" type="application/vnd.vmware.vcloud.task+xml">    <Link rel="edit" href="https://bblab-vcd02.com/api/task/8d84c54f-d121-4332-8de0-5700a63696cf" name="task" type="application/vnd.vmware.vcloud.task+xml"/>    <Link rel="edit" href="https://bblab-vcd02.com/api/task/8d84c54f-d121-4332-8de0-5700a63696cf" name="task" type="application/vnd.vmware.vcloud.task+json"/>    <Owner href="https://bblab-vcd02.com/api/site/c913fce4-e24b-4c38-83aa-3f66c8ae8ae7" id="urn:vcloud:site:c913fce4-e24b-4c38-83aa-3f66c8ae8ae7" name="bblab-vcd02.com" type="application/vnd.vmware.vcloud.site+xml"/>    <User href="https://bblab-vcd02.com/api/admin/user/4ac4c656-a5f4-48e7-b5cd-fc29847f2e2e" id="urn:vcloud:user:4ac4c656-a5f4-48e7-b5cd-fc29847f2e2e" name="adm.bbazan" type="application/vnd.vmware.admin.user+xml"/>    <Organization href="https://bblab-vcd02.com/api/org/a93c9db9-7471-3192-8d09-a8f7eeda85f9" id="urn:vcloud:org:a93c9db9-7471-3192-8d09-a8f7eeda85f9" name="System" type="application/vnd.vmware.vcloud.org+xml"/>    <Details></Details>    <VcTaskList/></Task>

Running the above “GET” command to return the associated sites will no longer return the removed site https://bblab-vcd02.com/api/site/associations/

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><SiteAssociations xmlns="http://www.vmware.com/vcloud/v1.5" xmlns:ovf="http://schemas.dmtf.org/ovf/envelope/1" xmlns:vssd="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingData" xmlns:common="http://schemas.dmtf.org/wbem/wscim/1/common" xmlns:rasd="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData" xmlns:vmw="http://www.vmware.com/schema/ovf" xmlns:ovfenv="http://schemas.dmtf.org/ovf/environment/1" xmlns:vmext="http://www.vmware.com/vcloud/extension/v1.5" xmlns:ns9="http://www.vmware.com/vcloud/versions" href="https://bblab-vcd02.com/api/site/associations" type="application/vnd.vmware.admin.siteAssociations+xml">    <Link rel="edit" href="https://bblab-vcd02.com/api/site/associations" type="application/vnd.vmware.admin.siteAssociations+xml"/>    <Link rel="edit" href="https://bblab-vcd02.com/api/site/associations" type="application/vnd.vmware.admin.siteAssociations+json"/>    <Link rel="add" href="https://bblab-vcd02.com/api/site/associations" type="application/vnd.vmware.admin.siteAssociation+xml"/>    <Link rel="add" href="https://bblab-vcd02.com/api/site/associations" type="application/vnd.vmware.admin.siteAssociation+json"/>    <Link rel="down" href="https://bblab-vcd02.com/api/site/associations/localAssociationData" type="application/vnd.vmware.admin.siteAssociation+xml"/>    <Link rel="down" href="https://bblab-vcd02.com/api/site/associations/localAssociationData" type="application/vnd.vmware.admin.siteAssociation+json"/>    <Link rel="up" href="https://bblab-vcd02.com/api/site" type="application/vnd.vmware.vcloud.site+xml"/>    <Link rel="up" href="https://bblab-vcd02.com/api/site" type="application/vnd.vmware.vcloud.site+json"/></SiteAssociations>

On the target site side(in this example bblab-vcd01 or bblab-vcd02) we see the connection is now listed as “ASYMMETRIC”

 <RestEndpoint>https://bblab-vcd02.com</RestEndpoint>            <BaseUiEndpoint>https://bblab-vcd02.com</BaseUiEndpoint>            <SiteId>urn:vcloud:site:c913fce4-e24b-4c38-83aa-3f66c8ae8ae7</SiteId>            <SiteName>bblab-vcd02.com</SiteName>            <PublicKey>-----BEGIN PUBLIC KEY-----MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA3cOHHCKWMJEx6Kgvta+cPWowhP1T3jpnFHOzyNUDJFH8JQ4s6+KrJq25isrhd1btevG4iuvZG8pQEIFq2HMM/rYftxw9X/vWUNE0onuRVA7T2R-----END PUBLIC KEY-----</PublicKey>            <Status>ASYMMETRIC</Status>

Make sure to remove the other side as well, in versions before 10.3 not removing both sides of the connection may cause performance issues

For more multisite goodness: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/vcat/architecting-multi-site-vmware-vcloud-director.pdf

VMware Cloud Director + NSX-T + vRLI Alert: Platform config fault errors during vApp power on

Troubleshooting notes that will hopefully be useful to future me or others that stumble upon this one

Hosts within NSX-T showing Down status under System –> Fabric –> Nodes –> Host Transport Nodes [ 5a6a21d5-3f29-42df-bad8-443399ef897d ] Internal Server Error

  • java.util.concurrent.ExecutionException: com.vmware.ssdc.library.exceptions.MultipleLMException: One or more exceptions have occurred – Multiple Exceptions follow: [com.vmware.ssdc.library.exceptions.VCPlatformConfigException: Platform config fault reported by vCenter Server.
    Platform Config fault occurred.
    An error occurred during host configuration.

vCenter Server task (moref: task-4514366) failed in vCenter Server ‘bblab-vc01’ (8b9de7b5-93df-48c2-8926-91ffc483dbf2)., com.vmware.ssdc.library.exceptions.VCPlatformConfigException: Platform config fault reported by vCenter Server.
Platform Config fault occurred.
An error occurred during host configuration.

vCenter Server task (moref: task-4514361) failed in vCenter Server ‘bblab-vc01’ (8b9de7b5-93df-48c2-8926-91ffc483dbf2).]

This is something that can be easily monitored, within vRLI vRealize Log Insight an alert can be created in attempt to be proactive and resolve the issue before customer impact.

Within the vRLI alert, the resolution is included:

Please SSH to the host and run the following command to start the nsx_opsagent to bring this host back to working order

/etc/init.d/nsx-opsagent start

if this command is not ran, the workloads on the host will fail to start causing errors

Reminder, the platform config error can mean many different things. In this case, ops agent was broken and I’ve seen this error many times lately

VMware Cloud Director: Unregister vCenter fails ERROR: update or delete on table “virtual_center” violates foreign key constraint “fk_vm_virtualcenter” on table “vm”

Standard disclaimer to start this one off: deleting anything from DB is not supported. Do this at your own risk!!

Have you or someone you know experienced foreign key constraints in VMware Cloud Director aka VCD? If so, please read on…. but be warned, this post contains graphic database content and is not for the squeamish:

Have you seen errors like

  • [ 8ab9ac99-5896-4c3a-81be-d5c1a8ba021b ] org.hibernate.exception.ConstraintViolationException: could not delete: [com.vmware.vcloud.common.model.VirtualCenterModel#037c42a9-5555-4444-1111-8db555b6e2b0]
  • OR: could not delete: [com.vmware.vcloud.common.model.VirtualCenterModel#037c42a9-5555-4444-1111-8db555b6e2b0]
  • ERROR: update or delete on table “virtual_center” violates foreign key constraint “fk_vm_virtualcenter” on table “vm”
    Detail: Key (id)=(037c42a9-5555-4444-1111-8db555b6e2b0) is still referenced from table “vm”.

This means there are still vm objects in the database that most likely have been deleted. Do not worry, this can be fixed. To be safe, making sure that 100% of VMs have already been removed from this vCenter in question but still the original error

Be safe, take a backup before diving in

Click on the vCenter in the VCD UI and grab the ID from the URL or simply grab it from the error message. In the DB, do a select to make sure that you only have 1 record in this virtual_center table.

bblab-db01=# select * from virtual_center where id='037c42a9-5555-4444-1111-8db555b6e2b0';
                  id                  |    name     | description |                  url                  |          username           |
                 password                                         | is_enabled |                 uuid                 | vc_version  | vc_update_level | vsphere_w
eb_client_url | is_use_vsphere_service | mark_for_delete | status | version | workload_folder_name | workload_folder_moref | compute_provider_scope | listener_co
nfig | tenant_scoped | provider_scoped | vc_none_network_name | vc_none_network_moref | comp_version_id | proxy_id | vc_build_number | certificate_id
--------------------------------------+-------------+-------------+---------------------------------------+-----------------------------+------------------------
------------------------------------------------------------------+------------+--------------------------------------+-------------+-----------------+----------
--------------+------------------------+-----------------+--------+---------+----------------------+-----------------------+------------------------+------------
-----+---------------+-----------------+----------------------+-----------------------+-----------------+----------+-----------------+----------------
 037c42a9-b59b-43de-9514-8db555b6e2b0 | bblab-vc02 |             | https://bblab-vc02/sdk | administrator@vsphere.local | Wlongstringofdataherethatdoesnotneedtobeinthisposthere== | f          | asdfasdf-190c-4055-8985-asdf65165asdf| TBD | 0               |
              | f                      | f               | READY  |      11 |                      |                       |                        |
   0 | f             | t               |                      |                       | 7.0.0.0         |          | 20178686        |
(1 row)

bblab-db01=#

Making sure there is only a single vCenter that we are running against. Do a select to find the vms in this vCenter. It would be recommended that you remove them 1 by 1 as you see the id within the painful error above, however; you could be darning and delete all vms …you’ve come this far…

bblab-db01=#SELECT * from vm where vc_id='037c42a9-5555-4444-1111-8db555b6e2b0';
(38 rows)
bblab-db01=#
bblab-db01=# DELETE from vm where vc_id='037c42a9-5555-4444-1111-8db555b6e2b0';
DELETE 38
bblab-db01=#

After I recklessly deleted all vms from that table where the vC ID matched, I was able to delete the vCenter from the VCD UI without issue.

Please feel free to reach out, hope this helps

VMware Cloud Foundation(VCF) + VMware Cloud Director(VCD): Refreshing hardware

  1. Why swap hosts versus adding new vCenters or clusters?
  2. Infra layout:
  3. Pre-reqs:
  4. Adding hosts in VCF and the swap:
  5. Rinse and repeat:

The goal of this post is to document the process followed to refresh hardware live in an actively used VCF instance with VCD. Not adding clusters, but rather moving new hosts into the existing clusters and the old hosts would be decommissioned.

Why swap hosts versus adding new vCenters or clusters?

Looking at this from a provider standpoint, causing downtime or requiring migration of workloads is impactful. Swapping hosts out in a small time window is done with workloads being vmotioned and is not impactful if the proper steps are taken. This method avoids/avoided downtime for the customer which was key here.

Adding additional clusters would take more time to let the workload naturally move to these new clusters. In VCD, we would then need to make the PVDCs elastic and disable the old cluster for use there. This was another good option but required more config changes and went away from a standard single cluster design here.

Infra layout:

Management cluster, a single vCenter, has 6 out of warranty hosts, and being replaced with 6 brand new hosts

Workload domains have 11 out of warranty hosts and will be replaced with 20 brand new hosts. Starting with 6 workload domains(vCenters) and will be reduced to 3. Really going from 66 workload hosts down to 60 new more powerful hosts.

All vCenters are using NSX-T and all workload domains are being consumed by VMware Cloud Director

Pre-reqs:

Ensure onboard slow NICs are disabled in the BIOS. We run ansible against the inventory files to complete this. This instance is running a /23 management network and a shared network for vSAN and vMotion. All switches backing the new hosts are configured to match the old switches. For NSX-T the edge subnets are different and will require new edges to be deployed, T0 interfaces updated, and BGP neighbors updated to reflect the new switches used for the new hosts.

Install ESXi that is in the current BOM for VCF. Example: VCF 4.4.1 requires VMware ESXi, 7.0.3, 19482537.

Set NTP servers, start service with host and take ESXi host out of maintenance mode. The script is based on importing a hosts1.csv file that has a single column I will attach here. Column header is “host”, everything below is the hostnames

Import-Csv C:\hosts1.csv | ForEach {
Connect-VIServer -Server $_.Host -User root -Password "P@SSW)RD_HERE"
Add-VmHostNtpServer -NtpServer "time1.domain.com,time2.domain.com"  -VMHost $_.Host | Out-Null
Get-VMHostService | Where-Object {$_.key -eq "ntpd"} | Set-VMHostService -Policy On | Start-VMHostService
write "NTP Server was changed on $Host"
Set-VMHost -VMhost $_.Host -State Connected
Disconnect-VIServer $_.Host -Confirm:$false
}

Adding hosts in VCF and the swap:

Commission hosts via JSON template to import. The JSON template can be downloaded from the Commission Hosts screen:

Make sure it passes jsonlint.com and if you have any duplicates you must cancel the screen and import again. A bug internally has been filed for this issue:

Commission hosts via above process, we allocated a single host to the mgmt cluster via VCF workflow. Once allocated to the cluster, ensure all storage is attached/mounted if it falls outside of what is known by VCF. In this case, the NFS storage falls outside of VCF and needed to be mounted to the new hosts manually.

***If the old hosts are kept long term, EVC might need to be enabled for VMs to vMotion between new and old hosts**

Once the first host is allocated to the management cluster, move a VM to this host and ensure the desired network connectivity. Our testing worked without issue and no additional changes were required

***It is worth noting, in this management cluster; NSX-T is in use but there are no edges present and the networking in VLAN backed***

Adding hosts to a WLD (Workload Domain) is the same process with one change. In this use case, new edges need to be deployed here. These edges will be added to the edge cluster in NSX-T, have interfaces added to the T0s and update the BGP neighbors. Kicking things off, 2 hosts were added via the VCF workflow. These 2 hosts were used for the new NSX-T edge deployment.

***This instance has multiple edges nodes in use for T0 and VCD routed orgvdc networks. Replacing them 1 at a time avoided network interruption***

After deploying new edges. Adding edges to the edge cluster is not really an add but a replacement. This can be found under System –> Fabric –> Nodes –> Edge Clusters –> (Select edge cluster) ACTIONS –> Replace Edge Cluster Member . The NSX-T official documentation details the process. When the nodes were replaced it required circling back to the T0 to update the interface details with a new subnet and name. Then updating the BGP config and all alarms cleared without issue. Repeat for any number of edges.

Now that networking is all set, the rest of the hosts were added to the WLD. This is not a quick process, but once it completes. The old hosts can be removed from the cluster via VCF workflow. This is nested under Inventory –> Workload Domains –> Click on desired Workload Domain –> Click on desired Cluster –> Hosts –> select any hosts to be removed. Once again, not a quick process since each host needs a full vSAN data evacuation. The process that VCF follows does migrate all content off of the host, VCD had no issues with the content moves

Rinse and repeat:

For each MGMT domain, WLD or cluster required follow the above process. This method worked for a VMware Cloud Director instance with 100s of VCD vApp templates, Shadow VMs, and active workload without issue.

VMware Cloud Director 10.4: Upgrade Considerations, Console Proxy Changes and Catalog Enhancements

  1. Security Changes:
  2. Console Proxy and Load Balancer Changes:
  3. Legacy Console Proxy:
  4. **OPTIONAL CLEANUP**
  5. Catalog Enhancements:

As Tomas pointed out in this post, there have been some great changes to how Console Proxy works in VCD 10.4 lead by Francois Misiak! There are some additional notes, steps and cleanup that can be completed in this major release of VCD.

Security Changes:

VMware Cloud Director 10.4 implements a new security requirement that the root CA from the ESXi hosts is trusted within VCD. This needs to be completed post upgrade and could save some confusion on why console sessions are not working. How does one do that, KB 78885 If that is not completed, an error similar to below could be seen and console sessions would fail to load with an error similar to:

18:44:36,631 | ERROR | pool-jetty-52 | ServerWebSocket | Connecting to ESX bblab.esx01.com [server: [L=/1
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderExcept
ion: unable to find valid certification path to requested target
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:480)

Console Proxy and Load Balancer Changes:

In this example instance, the load balancer pools were broken into 2 groups. VMRC and UI/API. Since there is no longer a dedicated console proxy port and URL, and all traffic will flow over port 443:

New Console Proxy/UI/API Configuration in 10.4

In this configuration we needed to add all cells to the UI/API group so they would all be used in the same way. From there we reclaimed the additional public IP and deleted DNS records that were in use. No other changes needed to be made.

Legacy Console Proxy:

The option to switch back still does exist and can be enabled under Administration –> Settings –> Feature Flags –> Select LegacyConsoleProxy and click enable. This does require cell reboots to take effect.

**OPTIONAL CLEANUP**

If this cleanup is completed you cannot enable the Legacy Console Proxy as it removes the settings and files required for Legacy Console Proxy to function!!

To cleanup the old configuration from the cells, there is a script that can be run from the cell-management-tool. In the vcloud-director/bin directory(in the Linux install), run the cell-management-tool

[root@bblab-vcd-cl01 bin]# ./cell-management-tool
Cell Management Tool v10.4.0.X
Type "help" for available subcommands.
cmt> clear-console-proxy-settings -h
usage: clear-console-proxy-settings [options]
  -c,--c <arg>    [Optional] Global config settings file location. Default will
                  use VCLOUD_HOME/etc/global.properties
  -h,--help       Get help.
  -r,--r <arg>    [Optional] Responses settings file location. Default will use
                  VCLOUD_HOME/etc/responses.properties
Console Proxy with unified endpoint which is enabled by default does not require
any console proxy settings. This command clears them out.
cmt>
cmt> clear-console-proxy-settings
Successfully cleared console proxy settings.
cmt>

This cleanup will delete the files pointed to the following paths and associated with consoleproxy if they are unique and not the same as user.certificate.path and user.key.path:

user.consoleproxy.certificate.path <--Will be removed
user.consoleproxy.key.path <--Will be removed

Before running the above script, I’ve captured the response.properties and global.properties files to compare before and after(removing sensitive info):

The following lines were removed from the global.properties file:

  • consoleproxy.host.https
  • consoleproxy.port.https
  • consoleproxy.keystore.password
  • consoleproxy.keystore.path
  • user.consoleproxy.certificate.path
  • user.consoleproxy.key.path
  • user.consoleproxy.key.password

The following lines were removed from the response.properties file:

  • user.consoleproxy.certificate.path
  • user.consoleproxy.key.path
  • user.consoleproxy.key.password

Catalog Enhancements:

So many fantastic improvements here!

  • Subscribed Catalogs display specific detailed steps. From the release notes: “VMware Cloud Director now provides a detailed view of the currently running catalog synchronization task step and the progress percentage of that step”. That enhancement alone is a fantastic improvement and provides visibility into the exact transfer of content. Below is what this now looks like in the UI:

When a subscribed catalog is created with the “Automatically download the content from an external catalog” setting disabled, the catalog is subscribed and only the catalog items metadata is created. This allows users to download individual items from subscribed catalogs instead of the entire catalog. Why is that useful, in the past if a tenant/customer has 150 vApp templates in CLOUD-A and only needs 12 of them in CLOUD-B, those 12 would need to be copied to a new catalog, published and subscribed in CLOUD-B. Now in 10.4 there is no need to created specific catalogs and make many different redundant copies of content. This enhancement allows tenants/customers to select which templates are synced over. In the test below, Catalog: bbtest-10-4-auto-disable has Automatically download the content from an external catalog setting disabled and has only synced the metadata:

Notice the size of each vApp template, since only metadata was created, no VMDKs were synced over until the item is selected and “SYNC” is selected and completed

There are many more enhancements in this feature rich release and will be covered in additional posts. A massive thanks and credit go to the Engineering team working behind the scenes to make VCD an even better, more feature rich product!

VCF Upgrade of NSX-T: Post Upgrade Validation fails error_message : No backup schedule provided in config, httpStatus : BAD_REQUEST, error_code : 29204

While trying to finish a NSX-T upgrade within VCF, it failed on the post upgrade check. Turns out, it was looking to see if the backup job was scheduled. Seeing that error, it seemed simple to enable within the UI. Navigating to the UI the “Schedule Recurring Backup” Menu seemed to die and never actually came back, so scheduling via the UI was not possible.

Looking around and trying to use powershell did not result in a quick fix, so checking out the API, it turns out there is a simple command to check backup config:

#Done in Postman: 
GET https://bblab-nsxt.bblab.com/api/v1/cluster/backups/config

Using the results from above and seeing what is required from the API guide:

#Done in Postman:
PUT https://bblab-nsxt.bblab.com/api/v1/cluster/backups/config

#BODY:
{
   "backup_enabled" : true,
   "backup_schedule":{
      "resource_type": "WeeklyBackupSchedule",
      "days_of_week":[
         1,
         3,
         5
      ],
      "hour_of_day":0,
      "minute_of_day":0
   },
    "remote_file_server": {
        "server": "bblab-backup1.bblab.com",
        "port": 22,
        "protocol": {
            "protocol_name": "sftp",
            "ssh_fingerprint": "SHA256:8rSvQEq8d8D*9389D8jKJDjk89dqZP0",
            "authentication_scheme": {
                "scheme_name": "PASSWORD",
                "username": "adminofbackups"
            }
        },
        "directory_path": "/nsxt-backup/bblab-nsxt"
    }
}

The above PUT command worked without issue, now VCF can finish the post upgrade check and all went well

VMware Cloud Director: Rights Bundles and Global Roles

Over the years there have been many requests in my current role to allow a specific set of users the ability to consolidate VMs. Thankfully that right exists and can be added to any Organization and role that is required.

Looking at the typical set of Global Roles that exist in Cloud Director. Specifically targeting the Organization Administrator Global Role we see the right exists but is unchecked under COMPUTE –> vApp –> Migrate / Force Undeploy / Relocate / Consolidate vApp VMs (which is near the bottom of the list in this version). Select that box and click save.

Why write a blog post around a simple checkbox one may ask? Well, checking that box under Global Roles does not make this role apply. Selecting the above setting or any setting requires that right to be part of a published Rights Bundle to specific tenants.

Jump over to Rights Bundles under Tenant Access Control and see these usual rights bundles.

**If a Legacy Rights Bundle” is shown as you see in the screenshot below, it will override non-legacy Rights Bundles and the change needs to be made there for the specific tenant**

Create a new Rights Bundle.

In this case named Consolidate and provide a required description. Scrolling down the list of rights, under COMPUTE, expand vApp and check the box for “Migrate / Force Undeploy / Relocate / Consolidate vApp VMs”.

Save that and then select the Rights Bundle created above and click PUBLISH:

Select the tenant to push these rights to:

Breaking this down, if the Rights Bundle is published to a tenant but the Global Role is not updated the tenant will be able to see the right that has been added but it will not be selected. Similar to this instance:

Make sure the Global Role edited above was also published to the required tenant. Hope that helps and please reach out with any question

*This does apply the rights to consolidate vApp VMs and vApp Template VMs**

VMware Cloud Director – Failed to delete folder from vCenter Server. Could not delete folder “vApp-bblab” because it contains: child “vm-12345”

When attempting to delete vApps (possibly in bulk) there are occasions where some remain in an unresolved state with no Virtual Machines present within the VCD UI:

From the VCD UI under Recent Tasks or the Monitor tab, this type of message appears:

Looking into the VCD Debug logs:

From Debug logs: 
 activity=(com.vmware.vcloud.vdc.impl.DeleteVappActivity,urn:uuid:8c6b91ce-72a4-4eb9-b1f7-31951ede930b)
2021-09-23 15:35:12,102 | ERROR    | Backend-activity-pool-4546 | DeleteVappActivity             | [Activity Execution] Uncaught Exception during Activity execution. Recent phase: com.vmware.vcloud.vdc.impl.DeleteVappActivity$FabricsCleanedPhase@6b7b7ba7 - Handle: urn:uuid:, Current Phase: DeleteVappActivity$FabricsCleanedPhase | requestId=,request=DELETE https://bblab.com/api/vApp/vapp-94a35eb5-7762-46d1-95c0-7d77c089a6d4,requestTime=1632411311263,remoteAddress=10.10.10.10:55491,userAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ...,accept=application/*+xml;version 37.0.0-alpha vcd=2f57d2ff-c33a-4499-9ac5-b5f1b0e2d4c2,task=91234567-c0ad-4829-9fe7-d81123456789 activity=(com.vmware.vcloud.backendbase.management.system.TaskActivity,urn:uuid:12345678-c1ad-1234-1fe7-d81123456789) activity=(com.vmware.vcloud.vdc.impl.DeleteVappActivity,urn:uuid:8c6b91ce-72a4-4eb9-b1f7-11951ede930b)java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.vmware.ssdc.util.LMException: Failed to delete folder [vcId=259e22a3-eda1-4e34-2b1e-55a57dc236d8, moref=group-v789012] from vCenter Server.
        at com.vmware.vcloud.val.purger.impl.FolderPurgeUtil.purge(FolderPurgeUtil.java:58)
        at com.vmware.vcloud.val.purger.impl.PurgeServiceImpl.purgeFolder(PurgeServiceImpl.java:49)
        at com.vmware.vcloud.vdc.impl.DeleteVappActivity$FabricsCleanedPhase.invoke(DeleteVappActivity.java:200)
        at com.vmware.vcloud.activity.executors.ActivityRunner.runPhase(ActivityRunner.java:175)
        at com.vmware.vcloud.activity.executors.ActivityRunner.run(ActivityRunner.java:112)
        ... 5 more
Caused by: com.vmware.ssdc.library.exceptions.FolderNotEmptyException: Could not delete folder "vApp-bblab (22a28eb9-1234-33d1-95c0-7d77c089a6d4)" because it contains:
child "vm-123456"
child "vm-789012"
        at com.vmware.vcloud.val.internal.impl.VC20VirtualEngine.DeleteFolder(VC20VirtualEngine.java:2551)
        at com.vmware.vcloud.val.purger.impl.FolderPurgeUtil.purge(FolderPurgeUtil.java:50)
        ... 9 more

The errors note there are still child VMs within the vApp. Knowing that the vApp itself is empty within VCD, moving down the stack the VMs can be found within vCenter still. The “vm-#####” can be found within vCenter either by using the ID listed in the VM or searching the unique ID of the vApp to ensure the correct VMs are found. Once found, these VMs can be deleted from disk:

Once they are deleted in vCenter, the vApp “folder” can be deleted within VCD without issue.

***As with any blog post, please be aware that once these VMs are deleted from disk they are actually deleted. Meaning they are not coming back 🙂 please be cautious and ensure the VMs are no longer required***

NSX-T: Upgrade from 3.0.2 to 3.1.1 UI Errors “Failed to get *”

A fantastic colleague of mine was working on upgrades of NSX-T earlier this year. During the upgrade while checking the cluster status the UI began displaying errors:

The status is miss leading as the upgrade was progressing as expected. There is no need to panic and reboot manager nodes! After waiting 7-10 minutes and checking the cluster status, the errors were gone as the upgrade continued. This does not appear to happen after NSX-T 3.1.1.

vCloud Director: Post upgrade to 10.2.2 Cell or Cells are inactive

During a normal upgrade to 10.2.2 that has been completed many times without issue, a colleague of mine noticed a single cell in an “Inactive” state in the UI:

This was not apparent post upgrade, until the service of that single cell were restarted. Attempting to fix this cell, the service was restarted which made no difference(failed to start), also rebooting the cell did not fix the issue. *Rebooting additional cells resulted in those cells failing to start and being Inactive as well

Next step was to dive into the logs and look to see why the service was not starting and the cell was showing inactive but nothing jumped out as “the problem”. Working with a very diligent support engineer, the following was found:

Caused by: java.util.concurrent.TimeoutException: Timed out waiting for service: 'filterConfigurationService', objectClasses='[interface com.vmware.vcloud.security.filters.FilterConfigurationService]', filter='(objectClass=com.vmware.vcloud.security.filters.FilterConfigurationService)'
       at com.vmware.vcloud.common.service.OsgiServiceReferenceFactoryBean.getObject(OsgiServiceReferenceFactoryBean.java:234)
       at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:178)

This was leading down a path of an issue that had been seen before where the “jms.user.system.password” was blank within the database. To validate, run the following:

select * from config where name = 'jms.user.system.password';

If the above does not return any value, the password is not set.

To fix this issue, create a new temporary database. First create a database template

#Create a template based on your existing DB
CREATE DATABASE dbname TEMPLATE template0;

#Create a temp DB based on template:
createdb -U test -T template0 vcddb-temp

#Restore DB backup pre upgrade to temp DB created above:
psql -U test vcddb-temp -f /path/to/vcddb-backup.dump.out

Run the above command to select the password, using that password run the following insert command:

insert into config (config_id,catvalue,name,value,sortorder) values ('588','vcloud','jms.user.system.password','xxxxxxxxxx','0'0);

Restart the cells and they should come back up as expected!