About me : i am Abbed Sedkaoui, worked on VMware virtualization, since GSX and ESX 3, and before that on Virtual Server and VirtualPC from Connectix who also first made Virtual Game Station (VGS a PSX that holded in a floppy disk 1.41MB) back in 1998, all the way up to today latest VCF VMware Cloud Foundation infrastructure VMware Cloud SDDC is based on.
In my views "it" (the Cloud) all started since 2008 with the advent of AMD "Nested Pages" and then 2009 Intel "Extended Pages Tables" in their processor became the trends for alot of compute: for Router i think VRF (Virtual Routing and Forwarding), for Firewall (Context), for Switch (VSI Virtual Switching Instance).
And Hopefully for us labbers we get since then the ability to deploy End2End all virtualized infrastructure :) Following William Lam since around that times. Fast forward 2023 successfully deploying VCF, i am looking to certify VCP-VMC as the required course is offered for FREE! Look at Required Step 2.
About this site : i'll share what worked for me when facing issues and "the problem solving critical thinking mindset" (i know.. its a mouthful :) used to document root cause analysis. Please don't mind the rusticness of this site as i literally created this from scratch on AWS in a few hours.
03/29/2024 Tutorial Install VMware Cloud Director 10.5.1.1 and Create Provider Virtual Data Center (pVDC) backed by vSphere with Tanzu Kubernetes 8.0u2 and NSX-T 4.1.2
The product page vmware.com/products/cloud-director.html
First download the OVA file, it's 2GB from VMware Customer Connect VMware_Cloud_Director-10.5.1.11019-23401219_OVF10.ova
- Prerequisite: NFS Network Path and DNS records
- Deploy the OVF Template
- Primary Appliance Setup (VAMI or Virtual Appliance Management Interface is on https://$VCDIP:5480)
- Add Ressources: vCenter Server from vSphere with Tanzu Supervisor Cluster already deployed
- Add Ressources: NSX-T Manager already deployed and Create a Geneve backed Network Pool
- Create Provider VDC backed by vSphere with Tanzu Supervisor Cluster and NSX-T This is continuation from previous highlighted Service Provider's tasks below, in the highly available Supervisor multi vSphere zones Lab.
- Next Service Provider's Tasks: Create a provider VDC backed by a Supervisor Cluster, Publish a Provider VDC Kubernetes Policy to an Organization VDC in VMware Cloud Director, Offers Kubernetes as a Service (CaaS).
Prerequisite: NFS Network Path and DNS records
Deploy the OVF Template
Primary Appliance Setup (VAMI or Virtual Appliance Management Interface is on https://$VCDIP:5480)
Add Ressources: vCenter backed by Supervisor Tanzu Cluster
Add Ressources: NSX-T Manager and Create a Geneve backed Network Pool
Create Provider VDC backed by vSphere with Tanzu Supervisor Cluster and NSX-T
For a more comprehensive approach on how to offer Kubernetes as Service with VMware Cloud Director
if you're VMware Partner Cloud Provider or just to be informed from high level view,
take a look at the latest Feature Friday on the subject
Feature Friday Episode 144 - Kubernetes as a Service with Cloud Director
and Download the Whitepaper: Architecting Kubernetes-as-a-Service Offering with VMware Cloud Director
03/07/2024 This website is finally updated to HTTPS for ease everyone access.
02/23/2024 Honored to be part of the VMware vEXPERT community in 2024 again !
https://vexpert.vmware.com/directory/10999
02/22/2024 Updated vSphere with Tanzu using NSX-T Automated Lab Deployment : Enabling Vlan for Management (1731), T0 uplink (1751),TEP (301), VRF (1683) and Trunk(1683-1687), Traffic Separation with 2 N-VDS
Fork Branch https://github.com/abbedsedk/vsphere-with-tanzu-nsxt-automated-lab-deployment/tree/vlan
Added 2 NSX Switch (N-VDS):
- Tanzu-VDS1 MGMT(+EDGE UPLINK T0 Segment) "North-South" Traffic
- Tanzu-VDS2 Overlay "East-West" Traffic
Ref: 7.4.2.2 Multiple virtual switches as a requirement NSX Reference Design Guide 4-1_v1.5.pdf P.291-293
(Compliance PCI,... separate dedicated infra components, Cloud Provider separate internal and external, Telco Provider NFV standard and enhanced vswitch)
Migrate VMKernel0 in VSS to Tanzu-VDS1, Remove old vSwitch0
2 Edge T0 interfaces (1 interface per edge) in Active-Active scaling out up to 10, LoadBalancing "North-South"
2 TEP per ESXi , 2 TEP per EDGE x 2 EDGE = 2 x (2x2) = 8 Tunnels for the bare minimum 1 ESXi and 2 Edges scaling out, LoadBalancing "East-West"
Requirements:
3 VLANs, 3 subinterfaces VLAN Gateway on a virtual router(VLAN 1731 MGMT, VLAN 1751 EDGE UPLINK T0, VLAN 301 VTEPs),
2 edges nodes (T0 Active-Active),
Trunk Vlan 4095 PortGroup ("VMTRUNK") and,
NestedVM Mgmt Vlan 1731 PortGroup ("1731-Network")
1 VRF VLAN (1683) in a TRUNK VLAN Range (1683-1687) (of up to 5 in a single Project), 1 subinterface VLAN Gateway, Corresponding at least 2 (max 10) T0 VRF interfaces (1 int per edge)
Virtual Router VM :
- NIC1 Outer Esxi "VM Network",
- NIC2 Outer "VMTRUNK",
Interfaces, Vlans, MTU, Source NAT, DNS Forwarding, NTP, SSH,
Static Routes:
- Default Route,
- Route to Supervisor Namespace (10.244.0.0/23) via T0 Interfaces(A/A) (172.17.51.121,172.17.51.122),
- Routes to Supervisor Ingress (172.17.31.128/27)and Egress (172.17.31.160/27) via T1 (10.244.0.1)(A/S)
VyOS config inspired from template of VyOS Module for PowerCLI
And from William Lam blog post's How to automate... Here.
vyos@vyos:~$ show configuration commands | strip-private set interfaces ethernet eth0 address 'xxx.xxx.1.253/24' set interfaces ethernet eth0 hw-id 'xx:xx:xx:xx:xx:d9' set interfaces ethernet eth0 ipv6 address no-default-link-local set interfaces ethernet eth1 hw-id 'xx:xx:xx:xx:xx:e3' set interfaces ethernet eth1 ipv6 address no-default-link-local set interfaces ethernet eth1 mtu '1700' set interfaces ethernet eth1 vif 301 address 'xxx.xxx.1.253/24' set interfaces ethernet eth1 vif 301 description 'VLAN 301 for HOST/EDGE VTEP with MTU 1700' set interfaces ethernet eth1 vif 301 ipv6 address no-default-link-local set interfaces ethernet eth1 vif 301 mtu '1700' set interfaces ethernet eth1 vif 1683 address 'xxx.xxx.3.253/24' set interfaces ethernet eth1 vif 1683 description 'VLAN 1683 for EDGE UPLINK T0 VRF' set interfaces ethernet eth1 vif 1683 ipv6 address no-default-link-local set interfaces ethernet eth1 vif 1731 address 'xxx.xxx.31.253/24' set interfaces ethernet eth1 vif 1731 description 'VLAN 1731 for MGMT' set interfaces ethernet eth1 vif 1731 ipv6 address no-default-link-local set interfaces ethernet eth1 vif 1751 address 'xxx.xxx.51.253/24' set interfaces ethernet eth1 vif 1751 description 'VLAN 1751 for EDGE UPLINK T0' set interfaces ethernet eth1 vif 1751 ipv6 address no-default-link-local set nat source rule 1 outbound-interface name 'eth0' set nat source rule 1 source address 'xxx.xxx.31.0/24' set nat source rule 1 translation address 'masquerade' set nat source rule 2 outbound-interface name 'eth0' set nat source rule 2 source address 'xxx.xxx.51.0/24' set nat source rule 2 translation address 'masquerade' set nat source rule 3 outbound-interface name 'eth0' set nat source rule 3 source address 'xxx.xxx.3.0/24' set nat source rule 3 translation address 'masquerade' set protocols static route xxx.xxx.0.0/0 next-hop xxx.xxx.1.x set protocols static route xxx.xxx.0.0/23 next-hop xxx.xxx.51.121 set protocols static route xxx.xxx.0.0/23 next-hop xxx.xxx.51.122 set protocols static route xxx.xxx.31.128/27 next-hop xxx.xxx.51.121 set protocols static route xxx.xxx.31.128/27 next-hop xxx.xxx.51.122 set protocols static route xxx.xxx.31.160/27 next-hop xxx.xxx.51.121 set protocols static route xxx.xxx.31.160/27 next-hop xxx.xxx.51.122 set service dns forwarding allow-from 'xxx.xxx.0.0/0' set service dns forwarding domain 3.168.192.in-addr.arpa. name-server xxx.xxx.1.100 set service dns forwarding domain 31.17.172.in-addr.arpa. name-server xxx.xxx.1.100 set service dns forwarding domain 51.17.172.in-addr.arpa. name-server xxx.xxx.1.100 set service dns forwarding listen-address 'xxx.xxx.31.253' set service dns forwarding listen-address 'xxx.xxx.51.253' set service dns forwarding listen-address 'xxx.xxx.3.253' set service dns forwarding name-server xxx.xxx.8.8 set service dns forwarding name-server xxx.xxx.1.100 set service ntp allow-client xxxxxx 'xxx.xxx.0.0/0' set service ntp allow-client xxxxxx '::/0' set service ntp listen-address 'xxx.xxx.1.253' set service ntp server xxxxx.tld set service ssh port '22' set system name-server 'xxx.xxx.1.100' set system name-server 'xxx.xxx.8.8'
Multiple vApp Deployment - Pre-Req Rename any tanzu-vcsa-4 vm before redeploying again!
2 Nested ESXi nodes with 24GB of RAM for testing prupose, to allow Tanzu Supervisor Cluster and Tanzu Kubernetes Cluster at least 28GB of memory is needed.
02/22/2024 New Tanzu using NSX-T Automated Lab Deployment with single Nested ESXi 28GB minimum and Workload Enablement single SupervisorVM and single replica deployments.
Editing Workload Control Plane (wcp) file for Lab prupose only without needing support editing does break support.
We will change the number of master from 3 to 1, change the disk from "thick" to "thin" and, restart the service.
ssh root@tanzu-vcsa-4 vi /etc/vmware/wcp/wcpsvc.yaml minmasters: 1 maxmasters: 1 controlplane_vm_disk_provisioning: "thin" :wq! service-control --restart wcp
Next let's go to Workload Management
Further editing files in SupervisorVM for Lab prupose only without needing support editing does break support.
We will change the number of replica from 3 to 1 and from 2 to 1 for the deployments in the namespaces starting with "vmware-system-" or "kube-system".
ssh root@tanzu-vcsa-4 /usr/lib/vmware-wcp/decryptK8Pwd.py ssh root@IP kubectl get deployments -A reduce from 3 to 1 replica bash <(kubectl get deployments -A -o json | jq -r '.items[] | select(.metadata.namespace | (startswith("vmware-system-") or contains("kube-system"))) | select(.status.replicas == 3) | "kubectl scale deployments/\(.metadata.name) -n \(.metadata.namespace) --replicas=1"') reduce from 2 to 1 replica bash <(kubectl get deployments -A -o json | jq -r '.items[] | select(.metadata.namespace | (startswith("vmware-system-") or contains("kube-system"))) | select(.status.replicas == 2) | "kubectl scale deployments/\(.metadata.name) -n \(.metadata.namespace) --replicas=1"') watch 'kubectl get deployments -A'Since the deployments happen once the bits are downloaded, they appear in the watch and we have to use ctrl+c to come back in the shell and upper arrow the scale down replica command.
This little babysit task allow less container running and is desired in LAb with limited resources. At last there is one deployment that need to be edited to comment the anti-affinity
ssh root@tanzu-vcsa-4 /usr/lib/vmware-wcp/decryptK8Pwd.py ssh root@IP kubectl get deployments.apps -n vmware-system-registry -o yaml > vmware-registry-controller-manager.yaml vi vmware-registry-controller-manager.yaml
kubectl apply -f vmware-registry-controller-manager.yaml deployment.apps/vmware-registry-controller-manager configured
Next Deploy T0 VRF as described
Here we only need following Project variables turned on and can leave VPC variables to 0.
$deployProjectExternalIPBlocksConfig = 1 $deployProject = 1
Next head over Namespaces to Create Namespace: tick "Override Supervisor network settings" and select T0 VRF in the dropdown menu
Namespace configuration:
We will add Storage, Users Permissions, VM size (VM Class), Content Libraries (TKRs) and, download CLI Tools.
For the sake simplicity, we will add king kong administrator as well.
With VM Class aka (flavor) we will set the size of the VM in our TKC, here i choose "xsmall" 2CPUs 2GB for each VM Master (aka Control Plane) or Worker.
Download Kubectl+vSphere plugin, vSphere Docker Credential Helper
We will login to Supervisor Namespace, then switch to our VRF Namespace context, to apply networkpolicy from a yaml via CLI using kubectl.
kubectl vsphere login --server=172.17.31.130 -u administrator@vsphere.local --insecure-skip-tls-verify KUBECTL_VSPHERE_PASSWORD environment variable is not set. Please enter the password below Password: Logged in successfully. You have access to the following contexts: 172.17.31.130 t0vrf-1683-prj-2-ns1 If the context you wish to use is not in this list, you may need to try logging in again later, or contact your cluster administrator. To change context, use `kubectl config use-context <workload name>`
kubectl config use-context t0vrf-1683-prj-2-ns1 Switched to context "t0vrf-1683-prj-2-ns1".
kubectl apply -f enable-all-policy.yaml networkpolicy.networking.k8s.io/allow-all created
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-all spec: podSelector: {} ingress: - {} egress: - {} policyTypes: - Ingress - Egress
Deploy TKC on VRF Namespace.
kubectl apply -f t0vrf-1683-prj2-tkc-v1alpha3.yaml tanzukubernetescluster.run.tanzu.vmware.com/t0vrf-1683-prj2-tkc-v1alpha3 createdThis k8s configuration yaml come from VMware Docs v1alpha3 Example: TKC with Default Storage and Node Volumes
apiVersion: run.tanzu.vmware.com/v1alpha3 kind: TanzuKubernetesCluster metadata: name: t0vrf-1683-prj2-tkc-v1alpha3 namespace: t0vrf-1683-prj-2-ns1 spec: topology: controlPlane: replicas: 1 vmClass: best-effort-xsmall storageClass: tanzu-gold-storage-policy tkr: reference: name: v1.25.7---vmware.3-fips.1-tkg.1 nodePools: - name: worker replicas: 1 vmClass: best-effort-xsmall storageClass: tanzu-gold-storage-policy tkr: reference: name: v1.25.7---vmware.3-fips.1-tkg.1 volumes: - name: containerd mountPath: /var/lib/containerd capacity: storage: 5Gi - name: kubelet mountPath: /var/lib/kubelet capacity: storage: 5Gi settings: storage: defaultClass: tanzu-gold-storage-policy network: cni: name: antrea services: cidrBlocks: ["198.53.100.0/16"] pods: cidrBlocks: ["192.0.5.0/16"] serviceDomain: cluster.local
Kubectl get node -o wide - Login Supervisor NS -- VRF NS -- TKC cluster + switch to TKC cluster context
kubectl vsphere login --server=172.17.31.130 -u administrator@vsphere.local --insecure-skip-tls-verify --tanzu-kubernetes-cluster-namespace t0vrf-1683-prj-2-ns1 --tanzu-kubernetes-cluster-name t0vrf-1683-prj2-tkc-v1alpha3 KUBECTL_VSPHERE_PASSWORD environment variable is not set. Please enter the password below Password: Logged in successfully. You have access to the following contexts: 172.17.31.130 t0vrf-1683-prj-2-ns1 t0vrf-1683-prj2-tkc-v1alpha3 If the context you wish to use is not in this list, you may need to try logging in again later, or contact your cluster administrator. To change context, use `kubectl config use-context <workload name>`
kubectl config use-context t0vrf-1683-prj2-tkc-v1alpha3 Switched to context "t0vrf-1683-prj2-tkc-v1alpha3".
kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME t0vrf-1683-prj2-tkc-v1alpha3-ch9cz-dw2p9 Ready control-plane 23m v1.25.7+vmware.3-fips.1 10.244.2.18 <none> VMware Photon OS/Linux 4.19.283-3.ph3-esx containerd://1.6.18-1-gdbc99e5b1 t0vrf-1683-prj2-tkc-v1alpha3-worker-d4t44-59cd4bc4bf-vtp94 Ready <none> 14m v1.25.7+vmware.3-fips.1 10.244.2.19 <none> VMware Photon OS/Linux 4.19.283-3.ph3-esx containerd://1.6.18-1-gdbc99e5b1VMware NSX Network Topology - Supervisor Cluster on T0 - Guest Cluster on T0 VRF
VMware NSX - 1ESXi 2 TEP x (2 x Edge VM 2TEP) = 8 Tunnels
VMware NSX - Edge1 Tunnel Endpoint (4 to ESXi + 4 to Edge2 TEP)
VMware NSX - Host Transport Node - ESXi Details
VMware vCenter - Outer VCSA Virtual switches - Trunk vswitch (VMTRUNK vlan 4095 + 1731-Network vlan 1731) - vswitch0
VMware vCenter - Inner VCSA Virtual switches - 2 NSX Switch (aka N-VDS) Tanzu-VDS1 MGMT - Tanzu-VDS2 TEP
Tanzu-VDS1 - Ports
Tanzu-VDS2 - Ports
VMware vCenter - Distributed Port Groups per N-VDS
ESXi 28GB takes SupervisorVM + TKC VMs
01/21/2024 Scaling out NSX Edge Cluster T0 in Active-Active mode, Loadbalancing Edge TEP and ESXi TEP and multiple Automated deployment.
You can grab it from my fork page.Or on real WL master repo in PR section.
More screenshots will follow soon!
Cheers
01/08/2024 What's in the new VMware vSphere Foundation (VVF) and VMware Cloud Foundation (VCF) offers?
Blog post and diagrams made by William Lam to better grasp the new offerings vmwa.re/skus in addition to recently published new VMware KB 95927.
12/22/2023 Recover from NSX-T unrecoverableCorfuError due to Power loss or Storage issue in singleton NSX Manager cluster
And in an unforeseen event like power outage or underlying storage issue, there is procedure to detach, to redeploy the VM and join the NSX-T Manager cluster (Replacing a faulty NSX-T manager node in a VCF environment (78967)).
That said, in the use cases with limited resources, the NSX-T Manager cluster could consist of only a single VM (see bottom of this article for the documentation reference).
And in an unforeseen event like power outage or underlying storage issue, there is no way to join the cluster if it consist of a single VM and that one is corrupt.
Only NSX Backup could restore the environment, if it has been setup ! But what if NSX Backup has not been setup yet, what to do ?
Here i present a simple trick that, i believe, is by no means supported, that might allow us to get our cluster recovered from unrecoverableCorfuError that is occurring when the database CorfuDB of NSX-T find its file corrupted.
Symptoms:
The NSX UI is stuck with error 101:
You have a single node cluster, that is one NSX-T manager and not the recommended three.
Cluster status could either show error or down status
admin >
get cluster status verbose
Another example with nested VCF 4 nodes setup and outer datastore disk full
Impact / Risks
Some NSX configurations may get deleted.
The trick is 3 simple steps and 1 step to confirm when the NSX-T cluster is stable:
1. stopping CorfuDB server service
root ~#
systemctl stop corfu-server.service
2. delete/rename the last log
root ~#
cd /config/corfu/log
root ~#
ls -lrth
root ~#
mv /config/corfu/log/77.log /config/corfu/log/77.bak
3. starting CorfuDB server service
root ~#
systemctl start corfu-server.service
4. get the cluster status while waiting to become stable
admin >
get cluster status
After recovering to a stable cluster we will look at the column "LEASE VERSION" matching the new (clean) log generated using the following command:
admin >
get cluster status verbose
Also there is a VMware by Broadcom Knowledge Base article KB90840 with same UnrecoverableCorfuError due to underlying storage issue on service corfu-nonconfig-server.
The service name is
corfu-nonconfig-server
the log directory is
/nonconfig/corfu/corfu/log/
I believe the same trick would work as well.
Note:
First this trick is not 100% reliable or it would be have been acknowledged as workaround,and having to wait lengthy dozen minutes for the cluster to come up stable,
we often find it simpler to restart the NSX Manager,
what i noticed in this case is a large amount read I/O, certainly caused by the sync happening at the start of the service.
In my case theses power outage came as frequently as 2-3 time per month due to bad ram / heavy nesting environment causing BSOD.
And finally this point out the importance of setting NSX Backup and testing the Restore !
And the placement of this SFTP backup server, as per the latest VMware® NSX-T Reference Design:
7.3.4.4 Singleton NSX Manager
The resources required to run a cluster of three NSX Managers may represent a challenge in
small environments. Starting with NSX version 3.1, VMware supports deploying a single NSX
manager in production environments. This minimal deployment model relies on vSphere HA
and the backup and restore procedure to maintain an adequate level of high availability.
vSphere HA will protect against the failure of the physical host where the NSX manager is
running. vSphere HA will restart NSX Manager on a different available host. Enough resources
must be available on the surviving hosts; vSphere HA admission control can help ensure they
are available in case of failure.
Backup and restore procedures help in case of failure of the NSX manager itself. The SFTP
server where the backup is stored should not be placed on an infrastructure shared by the
single NSX Manager node.
Quick dive deep into the CorfuDB history:
It is log appending database with fast performance,
(think like Kafka),
where log consist not only of text but also binary.
When i mean fast, i mean CorfuDB can write dozens if not hundreds thousands time per seconds !
Source: https://github.com/CorfuDB/CorfuDB/wiki/White-papers
12/14/2023 VMware by Broadcom Flings Continue
VMTN Flings
12/11/2023 VMware by Broadcom Dramatically Simplifies Offer Lineup and Licensing Model
By Krish Prasad, Senior Vice President and General Manager, VMware Cloud Foundation Division: VMware by Broadcom business transformation
Desktop Hypervisor Continue
11/22/2023 Broadcom announces successful acquisition of VMware
Hock Tan : President and Chief Executive Officer "Providing best-in-class solutions for our customers, partners and the industry"
11/12/2023 VMware Explore 2023 Breakout Session URLs
Links to videos with Customer Connect account and direct download links to supporting presentation slides.
VMware Explore EMEA 2023 Breakout Session URLs
VMware Explore US 2023 Breakout Session URLs
11/07/2023 Updated script Automated Tanzu lab Deployment with NSX VRF, Project, VPC
11/11/2023 Update Now merged to William Lam master repo! https://github.com/lamw/vsphere-with-tanzu-nsxt-automated-lab-deployment
My Fork with Branch NSX4 github.com/abbedsedk/vsphere-with-tanzu-nsxt-automated-lab-deployment/tree/nsx4
- Updated for vSphere 8.0 and NSX 4.1.1 due to API changes since vSphere 7 and NSX 3.
- Added a few checks to allow reuse of existing objects like vCenter VDS, VDPortGroup, StoragePolicy, Tag and TagCategory, NSX TransportNodeProfile.
- Added FAQ to create multiple Clusters, and using the same VDS/VDPortGroup, This allow Multi Kubernetes Cluster High-Availability with vSphere Zone and Workload Enablement.
- Added a few pause in the usecase where we deploy only a new cluster to allow Nested ESXi to boot and fully come online (180s) and before VSAN Diskgroup creation (30s).
- Added FTT configuration for VSAN allowing 0 redundancy and to use only one node demo lab VSAN Cluster. (This allow the whole Nested MultiAZ Tanzu lab with NSX VRF, Project, VPC, to run on 128GB box and the play by play of this usecase is next.)
$hostFailuresToTolerate = 0
- Added pause to the script to workaround without babysitting for AMD Zen DPDK FastPath capable owner CPU.
$NSXTEdgeAmdZenPause = 0
- Added -DownloadContentOnDemand option in TKG Content Library to prevent the download in advance of 250GB and reduce to a few GB.
- Added T0 VRF Gateway Automated Creation with Static route like the Parent T0 (Note: an uplink segment '$NetworkSegmentProjectVRF' is connected to parent T0 for connectivity to outside world)
- Added Project and VPC Automated Creation.
11/07/2023 A usecase vSphere with Tanzu using NSX Project VPC Networks and with Multi K8s Cluster High Availability using vSphere Zones
- Deploy 1st VSAN Cluster (+1h)vSphere with Tanzu using NSX-T Automated Lab same as before
- Deploy 2nd and 3rd VSAN Clusters (15min each) vSphere with Tanzu using NSX-T Automated Lab
- Todo after 3 Clusters deployments
- Deploy NSX T0 VRF and Project and VPC Subnets Segments IP Blocks (3 min)
- Create Zonal Storage Policy Multi-AZ-Storage-Policy
- Create 3 zones with the 3 Clusters
- Workload Control Plane (WCP) Enablement in Workload Management
- Enablement Begining to Ready
- Next Enterprise Developper's Tasks: Give a name to a Namespace, Deploy Class-Based or Tanzu Kubernetes Cluster (TKC) and, Deploy a stateful app with Cluster HA.
- Next Service Provider's Tasks: Create a provider VDC backed by a Supervisor Cluster, Publish a Provider VDC Kubernetes Policy to an Organization VDC in VMware Cloud Director, Offers Kubernetes as a Service (CaaS).
In the following section we will do a three-zone Supervisor deployment type.
Deploy 1st Cluster using vSphere with Tanzu using NSX-T Automated Lab same as before
With 3 Nested Esxi, if it is a requirement to fit in 128GB Memory box then specify only 1 Esxi hostname ip, this is possible with $hostFailuresToTolerate = 0
Fill the value of these 3 variables
$NestedESXiHostnameToIPs = @{...}
$NewVCVSANClusterName = "Workload-Cluster-1"
$vsanDatastoreName = "vsanDatastore-1"
1st Cluster
Now Deploying the 2nd and 3rd Cluster follow the steps:
- Change values of these 3 variables for 2nd and 3rd cluster deployments,
- Change to fixed value for the VAppName
- Change value of already deployed VMs (VCSA, NSXManager, NSXEdge) to 0,
- Change value in postDeployNSXConfig from $true to $false for all variables except ($runHealth, $runTransportNodeProfile, $runAddEsxiTransportNode),
$NestedESXiHostnameToIPs = @{ $NewVCVSANClusterName = "Workload-Cluster-2" $vsanDatastoreName = "vsanDatastore-2" $VAppName = "Nested-vSphere-with-Tanzu-NSX-T-Lab-qnateilb" # "Nested-vSphere-with-Tanzu-NSX-T-Lab-$random_string" # Random string can be used on the first cluster but reuse the $VAppName for 2nd and 3rd cluster deployments. $preCheck = 1 $confirmDeployment = 1 $deployNestedESXiVMs = 1 $deployVCSA = 0 $setupNewVC = 1 $addESXiHostsToVC = 1 $configureVSANDiskGroup = 1 $configureVDS = 1 $clearVSANHealthCheckAlarm = 1 $setupTanzuStoragePolicy = 1 $setupTKGContentLibrary = 1 $deployNSXManager = 0 $deployNSXEdge = 0. $postDeployNSXConfig = 1 $setupTanzu = 1 $moveVMsIntovApp = 1 $deployProjectExternalIPBlocksConfig = 0 $deployProject = 0 $deployVpc = 0 $deployVpcSubnetPublic = 0 $deployVpcSubnetPrivate = 0 if($postDeployNSXConfig -eq 1) { $runHealth=$true $runCEIP=$false $runAddVC=$false $runIPPool=$false $runTransportZone=$false $runUplinkProfile=$false $runTransportNodeProfile=$true $runAddEsxiTransportNode=$true $runAddEdgeTransportNode=$false $runAddEdgeCluster=$false $runNetworkSegment=$false $runT0Gateway=$false $runT0StaticRoute=$false $registervCenterOIDC=$false
2nd Cluster
3rd Cluster
NSX View
VCENTER View
Todo after 3 Clusters deployments:
- Esxi -> Configure -> TCP/IP Configuration -> IPV6 CONFIGURATION -> Disable
- Esxi -> Configure -> TCP/IP Configuration -> Default -> Edit -> copy 'Search domains' to 'Domain'
- Esxi -> Configure -> TCP/IP Configuration -> Default -> Edit -> inverse Preferred and Alternate DNS server if needed. (In my case this is part of why Workload Enablement wouldn't come up)
- SSH Esxi's Reboot via Send to all 'Multitab Putty' and Enter in each Esxi's Tab
- Snapshot/Export the Outer ESXi VM or the Lab vApp
- Start the Lab vApp and reset the alarms
- SSH virtual routeur, i use vyos, configure a static route each Project and VPC Subnet IP/Netmask via $T0GatewayInterfaceAddress (In my case this is the other part of why Workload Enablement wouldn't come up).
- Deploy NSX T0 VRF and Project and VPC Subnets Segments IP Blocks (3 min)
Fill the variables of section:
# Project ,Public Ip Block, Private Ip Block
# VPC, Public Subnet, Private Subnet
VMware Docs - VMware-NSX 4.1 - Add a Subnet in an NSX VPC
Self Service Consumption with Virtual Private Clouds Powered by NSX
(Gotcha: $VpcPublicSubnetIpaddresses must be a subset of $ProjectPUBcidr, and can't use the first or last subnet block size.)
# T0 VRF Gateway
# Which T0 to use for the Project External connectivity : $T0GatewayName or $T0GatewayVRFName (This option is important as it determine whether the T0 VRF Gateway is created or not.)
$ProjectT0 = $T0GatewayVRFName
Change values of all variables to 0 and set to 1 ($preCheck , $confirmDeployment , Project's and Vpc's ones).
$VAppName = "Nested-vSphere-with-Tanzu-NSX-T-Lab-qnateilb" # "Nested-vSphere-with-Tanzu-NSX-T-Lab-$random_string" # Random string can be used on the first cluster but reuse the $VAppName for 2nd and 3rd cluster deployments. $preCheck = 1 $confirmDeployment = 1 $deployNestedESXiVMs = 0 $deployVCSA = 0 $setupNewVC = 0 $addESXiHostsToVC = 0 $configureVSANDiskGroup = 0 $configureVDS = 0 $clearVSANHealthCheckAlarm = 0 $setupTanzuStoragePolicy = 0 $setupTKGContentLibrary = 0 $deployNSXManager = 0 $deployNSXEdge = 0. $postDeployNSXConfig = 0 $setupTanzu = 0 $moveVMsIntovApp = 0 $deployProjectExternalIPBlocksConfig = 1 $deployProject = 1 $deployVpc = 1 $deployVpcSubnetPublic = 1 $deployVpcSubnetPrivate = 1Note: Screenshot the summary before confirming as a reminder of the Subnet IP/Netmask later.
Deploy VRF, Project, VPC with all associated networking (IpBlocks, Segments, Subnets, Routing, DHCP) in 3.27 minutes.
Florilege of NSX API call from 2 PowerCLI Modules and from straight REST call.
NSX Topology T0/VRF - Project - VPC
- Create Zonal Storage Policy "Multi-AZ-Storage-Policy" -> No redanduncy (if you configured FTT = 0 one node cluster)
VMware Docs - VMware-vSphere 8.0 - Create Storage Policies for vSphere with Tanzu
×
VMware Docs - VMware-vSphere 8.0 - Deploy a Three-Zone Supervisor with NSX Networking
- Create 3 zones with the 3 Clusters
Workload Control Plane (WCP) Enablement in Workload Management
×
Enablement Begining to Ready
Next Developpers Tasks:Give a name to a Namespace, Deploy Class-Based or Tanzu Kubernetes Cluster (TKC) and, Deploy a stateful app with Cluster HA.
vSphere with Supervisor Cluster Configuration Files
Next Service Provider's Tasks: Create a provider VDC backed by a Supervisor Cluster, Publish a Provider VDC Kubernetes Policy to an Organization VDC in VMware Cloud Director, Offers Kubernetes as a Service (CaaS).
Publish a Provider VDC Kubernetes Policy to an Organization VDC in VMware Cloud Director
04/01/2023 Added Export option Nested VCF Lab VMs PR script.
The option can be set to run following the deployment or at later time wich is prefered to save a state of the Lab VMs as OVA at later time.
A FAQ is added to explain how to set option.
Note that script is coded to export the VMs of the latest vApp deployed by the script that start with the name Nested-VCF-Lab-.
15 min to stop, export as OVA, and start back the VMs
03/27/2023 Enable multiple vApp deployment on the same Cluster
Because i was unable to deploy multiple time i created an issue then a PR that got Merged.
03/05/2023 Comparing CPU I/O usage during VCF SDDC Management Bringup on 4 vs 1 Nesed ESXi node
NSX Topology T0/VRF - Project - VPC
- Create Zonal Storage Policy "Multi-AZ-Storage-Policy" -> No redanduncy (if you configured FTT = 0 one node cluster)
VMware Docs - VMware-vSphere 8.0 - Create Storage Policies for vSphere with Tanzu
×
VMware Docs - VMware-vSphere 8.0 - Deploy a Three-Zone Supervisor with NSX Networking
- Create 3 zones with the 3 Clusters
Workload Control Plane (WCP) Enablement in Workload Management
×
Enablement Begining to Ready
Next Developpers Tasks:Give a name to a Namespace, Deploy Class-Based or Tanzu Kubernetes Cluster (TKC) and, Deploy a stateful app with Cluster HA.
vSphere with Supervisor Cluster Configuration Files
Next Service Provider's Tasks: Create a provider VDC backed by a Supervisor Cluster, Publish a Provider VDC Kubernetes Policy to an Organization VDC in VMware Cloud Director, Offers Kubernetes as a Service (CaaS).
Publish a Provider VDC Kubernetes Policy to an Organization VDC in VMware Cloud Director
04/01/2023 Added Export option Nested VCF Lab VMs PR script.
The option can be set to run following the deployment or at later time wich is prefered to save a state of the Lab VMs as OVA at later time.
A FAQ is added to explain how to set option.
Note that script is coded to export the VMs of the latest vApp deployed by the script that start with the name Nested-VCF-Lab-.
15 min to stop, export as OVA, and start back the VMs
03/27/2023 Enable multiple vApp deployment on the same Cluster
Because i was unable to deploy multiple time i created an issue then a PR that got Merged.
03/05/2023 Comparing CPU I/O usage during VCF SDDC Management Bringup on 4 vs 1 Nesed ESXi node
Follow-up on 02/14/2023 previous issue.
Found that the root cause to be a nested lab environment use case or CPU-I/O contention on the hosts,
occurring on a task towards the end of the bringup called "Configure Base Install Image Repository on SDDC Manager",
that copy vcsa iso and nsx ova to an nfs on the 4 Nested ESXi VSAN datastore,
that made the cpu to the roof and consequently applications ruuning in the three VMs vCenter, NSX and SDDC manager had kernel stuck at one point or
multiple time.
Looking deeper into it, i think the subsequent tasks might had issue with kernel stuck vms (i feel there maybe missing pieces to understand it all ...).
Was monitoring while that contention happened,then made screenshots CPU and I/O usage of 2 SDDC bringup at time of that copy task to illustrate:
one when that whole issue occured with 4 nested ESXi
one with 1 nested ESXi using FTT=0 trick given by William Lam.
using less vCPUs (8 instead of 4x8) and a faster I/O capable NVMe SSD (PCIe 4.0 instead of 3.0) confirmed without kernel stuck all is well.
I think that on real gears this should not happen.
03/03/2023 PCIE 4.0 LAB UPGRADE - AMD Ryzen 3700X + Netac NV7000
B.O.M 308€
AMD Ryzen7 3700X 3,6 GHz 7NM L3 = 32M at 158€
Netac SSD 2tb M2 NVMe PCIe 4.0 x4 at 150€
Ordered on 02/11/2023 and received 03/03/2023 but was worth the wait,not only it come from the Official Netac store but on the back it says Quality Check "QC PASS 02/2023".
Note you have to have PCIe 4.0 capable motherboard, i choosen mine MSI X570 just for that and the fact that it run my older Ryzen 2700.
What to expect of this speedup i mean from PCIe 3.0 at 2000MB/s to PCIe 4.0 at 7000MB/s sequential read/write throughput, not really that because we all know OS use mixed read/write random 4KB,
nevertheless VCF Nested deploy twice faster in 15 minutes instead of 30 because the bandwidth is twice faster 😀.
02/24/2023 VMware Cloud Foundation with a single ESXi host for Workload Management Domain made by William Lam.
02/23/2023 Removing NSX CPU/Memory reservations when deploying a VMware Cloud Foundation (VCF) Management or Workload Domain made by William Lam.
as final step to get the modified NSX ova into the overlay part of "/mnt/iso/" known as "/upper/" from "/work/".
/mnt/iso/...ova # the bringup is seeing this directory wich is combination of the following 'oldiso' RO + 'upper' RW directories
|
/root/oldiso/...ova # read only filesystems
+
/overlay/upper/...ova # read write filesystems
/overlay/work/work/...ova # read write filesystems
I simply issued a "cp" of the ova from "/work" to "/upper" wich is writable and it was presented in the "/mnt/iso" thus
i shared these on the page that what worked for me.
Feel free to check it out, it's not only removing the NSX reservation for the 'Workload Management Domain' bringup but
also for later subsequent 'Workload VI Domain' which is wanted for limited resources on lab environement.
And now like "Neo" in "The Matrix" with "JuJitsu" i can say 'Yay i know the Linux overlay filesystems!' to make readonly writable, thanks to that (just a side note pointer, docker use exactly that for its layering).
02/14/2023 - SSDC Mananger 8 accounts disconnect
Just as in the post before click on the 3 dots and REMEDIATE using same password used in the deployment script
Steps to recover expired Service Accounts in VMware Cloud Foundation (KB 83615)
SSH into each of the 4 Nested ESXi
[root@vcf-m01-esx01:~] passwd svc-vcf-vcf-m01-esx01
Changing password for svc-vcf-vcf-m01-esx01
Enter new password:
Re-type new password:
passwd: password updated successfully
(note i didn't do the reset failed login part)
SDDC Manager ESXI svc accounts -> 3dots REMEDIATE with this newly created password
we must be logged with an another SSO user with ADMIN role
to be able to click REMEDIATE on PSC administrator@vsphere.local
I think a proper SSO ADMIN user like vcf-secure-user@vsphere.local illustrated in the KB is the way to go on production.
In my case since it was a lab i found an SSO account, so i promoted it to admin role.
Disclamer: i do not know if that is the supported even thought:
from the remediate password window we learn that service acount will be rotate after the remediate,
we can remove admin role from this service account.
Using a)SDDC manager UI or b)vCenter UI, it's easly done instead of API
a) SDDC manager UI as administrator@vsphere.local -> Single Sign On -> +USERS AND GROUPS -> Search User: svc , Refine search by: Single User, Domain: vsphere.local
Select the user svc-vcf-m01-nsx01-vcf-m01-vc01 -> Choose Role: ADMIN (note this can be done from vCenter see below), then click ADD.
b) vCenter UI as administrator@vsphere.local -> Licensing -> Single Sign On -> Users and Groups -> Users -> Domain: vsphere.local, Find: svc -> EDIT: Password, Confirm Password
c) SDDC manager UI login as svc-vcf-m01-nsx01-vcf-m01-vc01@vsphere.local -> Security -> Password Management -> PSC -> administrator@vsphere.local -> REMEDIATE again using the same original password
d) logout
optionally e) redo a) but select the 3dots and remove the admin role on this service SSO user.
Update
So mine fall in expected result because i didn't give a chance after the deployment to sync and refresh, less than 24h.
Lesson learned, if this happening again i will wait 24h before taking action.
Related this with someone experiencing similar effect on VMTN VCF 4.5.0 reporting accounts disconnected.
02/18/2023 - Importing VMs Vyos and nested ESXi, Checking and Configuring NTP
First the ovas import wizard don't need to be filled as the default are already set for our environment.
Vyos
One thing to do on the vyos console
is to remove occurence of old mac address "hw-id" and
any new interfaces in the config.boot file using
"vi /config/config.boot" then
"dd" command to delete line then
save it with ":" "wq!"
"configure"
"load /config/config.boot"
"commit"
"save"
"exit"
"reboot"
note: you got to learn where US QWERTY keymap are if you have AZERTY keyboard or be sure to load your regional keymap with "sudo loadkeys fr" ("fr" for french keymap).
nested ESXi and check NTP
One thing to do on all nested ESXi VM uppon import as well is to:
SSH into each of them to remount permanently the OS volume with this one liner for example and recheck NTP.
using Multi Tabbed Putty mtputty
ssh all 4 nestedesxi
tick send to all
UUID=$( esxcfg-volume -l | grep UUID | cut -b 17-52 ); esxcfg-volume -M $UUID
hit ENTER
ssh cb
tick send to all
ntpq -p
hit ENTER
At this point not all Esxi had ntp running or even setup or sitting in INIT state
Configure NTP server on nested ESXi
We're tempted to edit ntp.conf but there is a comment that tell not to
[root@vcf-m01-esx02:~]
cat /etc/ntp.conf
# Do not edit this file, config store overwites it
So how do we it:
Troubleshooting NTP on ESX and ESXi 6.x / 7.x / 8.x (KB 1005092)
for builds 7.0.3 onwards
this KB explain how to add "tos maxdist 15" setting
So we can use this same method to configure the server setting
/etc/init.d/ntpd restart NTPold="`cat /etc/ntp.conf | grep server`" NTPprefered="server 0.pool.ntp.org" cp /etc/ntp.conf /etc/ntp.conf.bak -f && sed -i 's/'"$NTPold"'/'"$NTPprefered"'/' /etc/ntp.conf.bak && esxcli system ntp set -f /etc/ntp.conf.bak cp /etc/ntp.conf /etc/ntp.conf.bak -f && echo "tos maxdist 15" >> /etc/ntp.conf.bak && esxcli system ntp set -f /etc/ntp.conf.bak esxcli system ntp set -e 0 && esxcli system ntp set -e 1 /etc/init.d/ntpd restart ntpq -pNTP service auto start is not working in ESXi 7.0 (KB 80189)
chkconfig --list ntpd
chkconfig ntpd on
reboot
That's it, you're set for success! Remember before you begin the bringup to shutdown all VMs to snapshots them all, just to be safe!
02/09/2023 UPDATE - Contributed to William vSphere with Tanzu using NSX-T Automated Lab Deployment script to allow additional Edge nodes creation. Now merged.
02/08/2023 - SDDC Manager account disconnected NSXT MANAGER
The trick here is to understand, the text "Specify the password that was set manually on the component", that means the same password we set on the deployment script, more than the misleading warning.
02/06/2023 UPDATE - Finally solution to issues NSX Installation and HA agent install on ESXi were due to lack of memory.
Clicking on NSX Install Fail.. we see that the ESXi host is lacking memory.
This 2nd Esxi Node happened to be one hosting the nsx VM but it had more than 13GB of free memory.
We can work around this issue by live migrating the NSX vm to the 3rd ESXi node, and then hit the Resolve Button
We see an unknown node status, but from KB 94377 we learn that is health check issue.
Next install of the HA agent onto this exact same 3rd ESXi Node fail.
I was thinking of doing the same trick with live Migration of NSX but not possible, then i shutdown NSX and migrated it to ESXi 4th node.
But then it wouldn't power on. Needing an extra ~200MB.
Looking at the 4th ESXi node there was plenty of memory apparently 28.7GB.
At that point i was curious, from vCenter enabled SSH service since it's stopped during bringup, to have a look at the available Reservation memory for the user namespace using this command found on VMTN:
memstats -r group-stats -g0 -l2 -s gid:name:parGid:nChild:min:max:conResv:availResv:memSize -u mb 2> /dev/null | sed -n '/^-\+/,/.*\n/p'
I figured out that if NSX need 16384MB of reservation when here we see 16372MB reservation available + 178 MB overhead,
16384-16372+178=200MB that would explain why vCenter admission failure wouldn't let NSX vm power on.
The solution is easy, just bump the ESXi memory a bit more, at that time i was testing 42GB, so redone the lab with 46GB and it worked flawlessly on these tasks. Now merged.
02/04/2023 UPDATE - Note i do not recommend doing the Bringup with nesting the Nested environment like in the picture above, to have better performance (having 3 hypervisors "in a row" is only meant for lite Lab deployment 😀) Following below i'll explain how deploy, then modify Cloud Builder timeout, then export.
And to avoid BUG soft lockup which makes feel pretty much like PSOD with some nasty effect (more on that later in disconnected account posts here and here) and many others issues that could arise during bringup. Clearly the expected I/O throughput is minimum on 100s MB/s not 10s MB/s.
Export the Nested VCF Lab's VMs
If you do know how to connect the VirtualInfrastrure then that can be done in one liner powershell to export the VApp:
Get-VM -Name vcf-m01-* | Export-VApp -Destination "D:\VM\Nested\Vapp\" -Force -Format Ova | Out-Null
Update 04/01/2023 PR Done
Customization pre export:
Use multi tabbed SSH client, on Windows MTPuTTY is free.
For Cloud Builder vm ssh to it and extend this two timeout:
sed -i 's/ovf.deployment.timeout.period.in.minutes=40/ovf.deployment.timeout.period.in.minutes=180/' /opt/vmware/bringup/webapps/bringup-app/conf/application.properties sed -i -e's/nsxt.disable.certificate.validation=true/nsxt.disable.certificate.validation=true\nnsxt.manager.wait.minutes=180/' /opt/vmware/bringup/webapps/bringup-app/conf/application.properties
systemctl status vcf-bringup
systemctl restart vcf-bringup
After the customization of the vm done and CB validation is all green, rerun the script with all option set to 0 execpt
$preCheck = 1
$confirmDeployment = 1
$exportVMs = 1
Export the Virtual router's VMs
Additionally export also your virtual router(s), in my case it is a csr1000v, supposedly there are deployed with name convention csr-*
Get-VM -Name csr-* | Export-VApp -Destination "D:\VM\Nested\Vapp\" -Force -Format Ova | Out-Null
Get-VM -Name vyos-* | Export-VApp -Destination "D:\VM\Nested\Vapp\" -Force -Format Ova | Out-Null
01/25/2023 UPDATE - Good News Automated VMware Cloud Foundation Lab Deployment script new version already here !
Just asked for it few days ago here, then shared some of these tips on William Lam website and on the same day, (would you believe it ?) a PR and a merge make it happen ! The virtualization community is fast 😀. This version include fix for step 1,3,4 (need to follow the KB i choose option 2 patch with winscp or integrate it in the ova),5,7.
01/21/2023 - VCF v4.5 Lab
PHYSICAL LAB B.O.M 900€ (GPU & HDD not counted)
• RYZEN 2700 BOX (230€) officially support 64GB but it takes
• 128GB DDR4 Patriot 4 x 32GB at (100€) each with few BSOD MEMORY MANAGEMENT
• MOTHERBOARD MSI X570 (170€)
• 1TB SSD NVMe M.2 Micron P1 (100€) (100GB for OS and 831GB for LAB that became full! I got a story)
VM DC+DNS+iSCSI+NFS 2vCPUs 2GB
VM HOST-ESXI+VCSA 16vCPUs 104GB
For Router specifically
1 adapter not tagged for management
8 adapter trunk port group vlan 4095 (coming from windows VMware Workstation VMNet adapter Configuration Jumbo + vlan 4095 + all IP protocols unchecked)
7 configured sub-interface dot1q tag corresponding to VLAN desired for the bringup
1 configured as trunk
For Nested ESXi specifically
4 adapter on trunk port group
1. After deployment Automated VMware Cloud Foundation Lab Deployment
Open Outer vCenter change the 1st disk from 12GB to 32GB in Nested ESXi VMs or Cloud builder fail "VSAN_MIN_BOOT_DISKS.error".
2. Change the 3rd (Vsan Capacity) disk from 60GB to more than 150GB if the Nested ESXi are Nested themselves in an ESXi VM !!
(I go into the inception movie running the Outer ESXi in a VM on windows VMware Workstation. The advantage to snapshot the whole thing is
significantly appreciated especially for the VCF bringup but the slowness less appreciated.) Regarding speed I’m looking forward trying PCIe 4.0 NVME
once I upgrade my CPU 3700X to speedup some tasks of the bringup and avoid some CPU issues related (windows BSOD).
3. Change all four passwords of SddcManager with ones as strong as the NSX ones
4. I got “Gateway IP Management not contactable” -> patch it with KB 89990 (release notes)
5. Failed VSAN Diskgroup -> “esxcli system settings advanced set -o /VSAN/FakeSCSIReservations -i 1” on the Outer ESXi.
6. For DUP “esxcli system settings advanced set -o /Net/ReversePathFwdCheckPromisc -i 1”
7. Instead of DHCP, use IP Pool VMware Cloud Foundation API Reference Guide SDDC look for
"ESXi Host Overlay TEP IP Pool"
8. Use a router IP as NTP for VCF but configure on the router a reliable stratum external NTP server
9. After Validation All green, Before launching the bringup Modify some CloudBuilder timeout:
vim /opt/vmware/bringup/webapps/bringup-app/conf/application.properties sed -i 's/ovf.deployment.timeout.period.in.minutes=40/ovf.deployment.timeout.period.in.minutes=180/' /opt/vmware/bringup/webapps/bringup-app/conf/application.properties sed -i -e's/nsxt.disable.certificate.validation=true/nsxt.disable.certificate.validation=true\nnsxt.manager.wait.minutes=180/' /opt/vmware/bringup/webapps/bringup-app/conf/application.properties echo "bringup.mgmt.cluster.minimum.size=1" >> /etc/vmware/vcf/bringup/application.properties systemctl restart vcf-bringup watch "systemctl status vcf-bringup"
tail -f /opt/vmware/bringup/logs/vcf-bringup-debug.log
10. Disable Automatic DRS of VC NSX and SDDC Manager after each deployment
in the Inner Vcenter or else VSAN will rebalance those critical VM during the others being deployed :
Cluster -> Configure -> VM Overrides -> Automatic DRS -> Disabled or Manual
vcf-m01-vc01
vcf-m01-nsx01a
vcf-m01-sddcm01
01/19/2023 - Deploying Cloud Director in small form factor : the troubleshoot
This issue arise due to slow NFS access and lack of cpu for the initial primary cell boot. Encountered in version 10.4Long story short, issue this command to relax the timeout of NFS access:
sed -i s/10s/60s/ /opt/vmware/appliance/bin/appliance-sync.shand bump up the vCPUs from 2 to 4.
The best way to avoid thinkering the appliance scripts file is to give it at least 4 vCPUs before deploying, as there is an hard coded value of 8 CPUs, i detemined that 4 is sufficient based on top utility showing 400% cpu usage, meaning 4 x 100% x 1 CPU core.
I had previously answered to this issue in VMTN which were found helpful.VMware Technology Network > Cloud & SDDC > vCloud > VMware vCloud Director Discussions > Re: Configure-vcd script failed to complete