53: Nutanix Weekly: Step-by-Step Guide to Deploying Nutanix Metro Availability

May 11, 2022

Host: Andy Whiteside
Co-host: Harvey Green
Co-host: Jirah Cox


00:00:02.520 –> 00:00:06.629
Andy Whiteside: Everyone welcome to episode 53 of mechanics weekly i’m your host Andy whiteside got.

00:00:07.589 –> 00:00:24.630
Andy Whiteside: Harvey green with me, who is kind of local he’s a I don’t know 45 minutes away by car and gyros now two and a half, three hours away by car, we were just debating define local in 2022 Harvey was your answer to how you to find local in 2022.

00:00:25.080 –> 00:00:38.130
Harvey Green: I told jarrod that his mother, the latency between where he is and where I am round trip in my car isn’t what it used to be at all so he’s not as local as he used to be yeah.

00:00:38.880 –> 00:00:45.180
Jirah Cox: If I recall my high school physics i’ve been saying i’m two and a half to three hours away make certain assumptions about speed.

00:00:46.410 –> 00:00:46.890
Andy Whiteside: Well yeah.

00:00:47.550 –> 00:00:49.290
Harvey Green: I defined in my car.

00:00:51.120 –> 00:01:01.110
Andy Whiteside: Well, and you’re really saying how fast, can I get to you in this case, get there and get back and acknowledge somebody acknowledged that you got they’re not in this is really good for what we’re gonna talk about here in a second.

00:01:03.240 –> 00:01:10.020
Andy Whiteside: But you know so Harvey i’m not doing it today, but let’s say able to leave my office and go pick up my kids at school, I know exactly how to get there.

00:01:10.320 –> 00:01:18.960
Andy Whiteside: Yes, but because I don’t know what traffic is going to be like where I live, where you live it’s a nightmare i’m sure where I live, now probably a nightmare to yeah um.

00:01:19.590 –> 00:01:25.170
Andy Whiteside: You know I don’t know how to get there, because I don’t know what the traffic situation is going to be, therefore, I use smart software.

00:01:25.500 –> 00:01:34.680
Andy Whiteside: Okay, like an sd way and type of thing, but in this case I use GPS and it’s in my car I turn it on, even though I know where i’m going and occasion it’s like oh no don’t go that way you need.

00:01:34.680 –> 00:01:35.040

00:01:36.240 –> 00:01:40.290
Andy Whiteside: I gotta be stubborn oh i’m going this way and I go the way I know and like oh no car accident.

00:01:40.500 –> 00:01:41.970
Three hours sitting in traffic.

00:01:43.800 –> 00:01:51.420
Andy Whiteside: Which kind of maybe takes us to our topic for today and and I asked about we were talking about stretch clusters, the other week and I brought up.

00:01:52.770 –> 00:01:54.450
Andy Whiteside: metro know was a call.

00:01:57.120 –> 00:02:07.800
Andy Whiteside: And gyro has brought that topic for today from the new tannic Community blogs, step by step guides deploy new tactics metro availability.

00:02:08.280 –> 00:02:17.970
Jirah Cox: Yes, a drone wrote up a fantastic blog post here I love, how are our Community blog engine assigned some the label of trendsetter fantastic.

00:02:18.750 –> 00:02:33.450
Jirah Cox: career as your own on your trend setter status and yeah I think another fun connection to make here between you driving somewhere you’ve already learned how to get to, but the value is the software he called out, but also the the data right the real time data of like.

00:02:34.530 –> 00:02:46.020
Jirah Cox: How are the roads running right and what do I want to avoid as I get from A to B there has a lot of applicability right, because we can use our we can use a metro cluster obviously for.

00:02:47.040 –> 00:02:56.910
Jirah Cox: worst case scenarios, but there’s value in even not all the way to worst case scenarios as well right so like we can use it for like say planned maintenance right if I know that I need to.

00:02:59.730 –> 00:03:08.670
Jirah Cox: get ahead of like say generator maintenance or transfer switch or ups maintenance, maybe I want to go ahead and move workloads over to other data Center anyway right rather than.

00:03:10.050 –> 00:03:15.030
Jirah Cox: You know, trust that ups maintenance is going to go totally totally smoothly yeah.

00:03:16.170 –> 00:03:23.910
Andy Whiteside: The goal is to get there and get there as fast as you can, but you want to get there in one piece so getting there is the most important part.

00:03:24.150 –> 00:03:26.370
Andy Whiteside: yeah we have become overly.

00:03:27.540 –> 00:03:37.320
Andy Whiteside: Overly familiar with the idea of the acceptance that we have to get there fast and that’s today’s age, but getting there getting there all in one piece.

00:03:38.730 –> 00:03:39.480
Harvey Green: Yes.

00:03:40.500 –> 00:03:43.290
Harvey Green: Totally oh packet or limb loss.

00:03:49.170 –> 00:03:50.760
Jirah Cox: I can’t agree yeah.

00:03:54.210 –> 00:03:56.580
Jirah Cox: This podcast is officially against limb loss.

00:03:56.610 –> 00:03:58.890
Jirah Cox: Yes, i’ll plant that flag.

00:03:59.850 –> 00:04:01.590
Andy Whiteside: No blood and guts and Gore.

00:04:02.880 –> 00:04:03.510
Harvey Green: that’s right.

00:04:03.690 –> 00:04:04.200
Jirah Cox: So to.

00:04:04.500 –> 00:04:05.910
Andy Whiteside: me no data loss either, though.

00:04:06.480 –> 00:04:07.500
Jirah Cox: Totally no data loss.

00:04:07.680 –> 00:04:08.670
Jirah Cox: Know James what data loss.

00:04:08.970 –> 00:04:15.150
Jirah Cox: So to start painting a word picture here right in drums drums right up here, I have is.

00:04:15.810 –> 00:04:31.740
Jirah Cox: I think this is a lab deployment so start off with two new tonics clusters, so we want one cluster in like site a or availability zone, a or data Center a another cluster in sight to availability zone to data Center to what uh however you break them up.

00:04:32.790 –> 00:04:45.030
Jirah Cox: The entire point or at the end state of deploying and you say next metro cluster which is really tuning to X clusters and then and then one in this case, this is using V sphere as the hypervisor which is not required, you can do it on HP as well.

00:04:46.440 –> 00:04:54.660
Jirah Cox: Is the outcome of you know all all the humans are asleep at three in the morning one data Center goes fully dark loses power let’s say.

00:04:55.590 –> 00:05:06.390
Jirah Cox: All I want all workloads to automatically resume at the other data Center and to your point Harvey what zero data loss right so just as if the vm is crashed at one site.

00:05:06.900 –> 00:05:13.050
Jirah Cox: And they will power, on the other side, as if it was a DJ event from node to node within one cluster but we’re going to cross.

00:05:13.380 –> 00:05:22.980
Jirah Cox: A larger geography than just within one cluster right we’re not going from blade to blade node to node rack Rack, this is going to go from data Center at the data Center be right is the.

00:05:24.420 –> 00:05:28.920
Jirah Cox: expected and desired outcome so to do that we’ve got our clusters that each site.

00:05:29.940 –> 00:05:37.050
Jirah Cox: Because each cluster has an autonomous unit right so i’m going to write in this case, probably two copies of data and cluster a two copies of data and cluster B.

00:05:37.620 –> 00:05:44.880
Jirah Cox: keep them in sync and then with synchronous replication right no data loss I wait for that remote acknowledgement.

00:05:45.300 –> 00:05:57.930
Jirah Cox: So if a vm is running it site a I wait for site be to tell me hey i’ve got that data as well before I acknowledge back to the guest vm hey i’ve got your data right, so the right doesn’t succeed, to the guest os until we’ve got data written on both clusters.

00:05:59.130 –> 00:06:00.480
Jirah Cox: So, within each cluster then.

00:06:01.860 –> 00:06:12.600
Jirah Cox: You know, then on each cluster I create a local data store basically or a data store that goes from from data store eight site A to B and then it’s i’d be I created a data store container.

00:06:12.960 –> 00:06:21.240
Jirah Cox: From site beat a so i’ve got both my both my directions their journeys the the the nomenclature metro one dash two and metro to dash one.

00:06:22.290 –> 00:06:35.760
Jirah Cox: But basically it’s It shows in the labeling right for a human to see these are vm that replicate from one site to another site and what the direction is there, so we can keep a straight as we’re like building a new vm in our vm creation wizard so.

00:06:36.090 –> 00:06:47.490
Andy Whiteside: i’ve got cluster number one in building one and then how far away can cluster number to be to do this and how much redundancy do I have to have.

00:06:48.600 –> 00:06:49.200
Jirah Cox: So.

00:06:49.320 –> 00:06:50.790
Jirah Cox: The next person is not set.

00:06:50.880 –> 00:07:02.250
Jirah Cox: By how far away, is not a factor of distance factor of latency book says we want to be five milliseconds away or less in reality we want to be as close to the or less as possible.

00:07:03.420 –> 00:07:06.630
Jirah Cox: If you told me that the link has four milliseconds of latency from end to end.

00:07:07.110 –> 00:07:20.970
Jirah Cox: Then that’s four milliseconds you’re going to add to each and every single right for the life of the cluster right you’re slowing down all vm right operations by that delay, so we want that to be as low as possible zero milliseconds one millisecond as much, much better.

00:07:21.240 –> 00:07:34.140
Harvey Green: So that that’s very, very important that you bring that up because you know, one of the things that we talked about a lot with 10 X is the acknowledgement of rights and the kind of women to this a little bit too, but.

00:07:34.680 –> 00:07:52.440
Harvey Green: You want to make sure that you’ve got the optimal set and that you’re not killing yourself performance wise to try to make this work, because you actually are waiting on that latency to take place before those rights are acknowledged as I write.

00:07:52.830 –> 00:08:03.090
Jirah Cox: 100% yeah yeah if a vm lives inside one and you’ve configured it for synchronous replication decided to then it’s going to wait until site to says i’ve got that data, before I can proceed.

00:08:03.900 –> 00:08:19.890
Harvey Green: yeah so for for all the people who haven’t been doing this very long they say well for a millisecond that’s nothing I got you know 30 or 40 or 50 or 60 between this Simon this next slide Why would I care about for.

00:08:22.320 –> 00:08:24.450
Jirah Cox: Good because it’s a lot for your storage system.

00:08:25.200 –> 00:08:30.480
Jirah Cox: started adding 60 milliseconds to everything you did on your laptop you would be negative one phone call.

00:08:31.560 –> 00:08:32.940
Jirah Cox: or or looking for a new laptop.

00:08:33.660 –> 00:08:40.080
Harvey Green: Right so that’s that’s for the I guess the the big answer behind that is that for a millisecond every right.

00:08:40.410 –> 00:08:42.600
Jirah Cox: Every single right.

00:08:42.780 –> 00:08:48.300
Andy Whiteside: So that’s what we really talk in fiber configurations, in order to get this done so, we do have a copper these days.

00:08:49.200 –> 00:08:58.770
Jirah Cox: The medium doesn’t matter too much well to be at any kind of meaningful distance you probably will be fiber away, you could I sure you could use copper technically to be properly agnostic.

00:09:00.390 –> 00:09:06.840
Jirah Cox: It almost becomes a matter of function of availability planning right and like shared nothing so.

00:09:07.290 –> 00:09:17.070
Jirah Cox: Could I get from like a data Center hall to data Center hall like we have some regional data centers around here that have multiple kind of shared nothing availability zones within one roof.

00:09:18.420 –> 00:09:28.320
Jirah Cox: And if you want to do that, of course, you totally could but normally most customers will be looking to go like across the city, maybe across the state, depending on what size of state you’re sitting in when you listen to this.

00:09:29.610 –> 00:09:31.740
Jirah Cox: New England, yes, Texas, maybe not.

00:09:33.090 –> 00:09:34.230
Jirah Cox: in general.

00:09:34.320 –> 00:09:38.040
Jirah Cox: 50 ish miles i’ve heard is kind of a typical translation of.

00:09:39.180 –> 00:09:42.870
Jirah Cox: The kind of low latency you’d want to to build this kind of a construct on top of.

00:09:45.270 –> 00:09:48.090
Andy Whiteside: And that that aren’t that rf so right.

00:09:49.110 –> 00:09:51.570
Andy Whiteside: redundancy factor of two.

00:09:51.630 –> 00:09:52.350
Harvey Green: To yeah.

00:09:52.590 –> 00:09:53.220
Andy Whiteside: Now, would you.

00:09:53.250 –> 00:09:56.580
Andy Whiteside: Typically, have a say you got to have at least redundancy factor of two, on the other.

00:09:56.580 –> 00:09:59.640
Andy Whiteside: side to courage and are we waiting on both thumbs right you’re just one of them.

00:10:00.930 –> 00:10:03.390
Jirah Cox: Both but they occur, you know simultaneously.

00:10:04.830 –> 00:10:15.480
Harvey Green: yeah so that that latency we’re talking about potentially that formula seconds only has to happen once and then, once it gets to the other side it’s writing that second right locally.

00:10:16.260 –> 00:10:20.610
Jirah Cox: Because don’t like in some ways don’t overthink it like it’s the cluster.

00:10:20.910 –> 00:10:25.980
Jirah Cox: So my data to the other cluster the cluster won’t acknowledge it until it right is compliant with the rf.

00:10:27.390 –> 00:10:30.870
Jirah Cox: Right, so you sort of are waiting on both be related in the cluster to acknowledge it.

00:10:31.770 –> 00:10:39.510
Andy Whiteside: Okay, so it’s the clusters, the brain of a quick question for you, it doesn’t miss I did a quick search, we have to have prison central to pull this off for now.

00:10:39.660 –> 00:10:41.400
Jirah Cox: We actually do.

00:10:42.660 –> 00:10:51.120
Jirah Cox: not really for this one as as drones are up in his lab yeah not not fully required recommended as a lot of value to every environment yeah.

00:10:52.650 –> 00:11:02.070
Andy Whiteside: If you can see what i’m doing I did I did the path being out yahoo.com and i’m working my way down the chain to start seeing you know more than four seconds seconds and they’ll let.

00:11:03.210 –> 00:11:21.600
Jirah Cox: Well, so remember that often there’s some black magic going on there right if you say it in Google 8888 right or 888844 either one that’s a magical IP that like can have over responder closer to my house then say California.

00:11:21.900 –> 00:11:22.260
Jirah Cox: Right.

00:11:22.320 –> 00:11:23.670
Jirah Cox: So there’s some there’s some.

00:11:26.550 –> 00:11:29.580
Jirah Cox: I don’t know enough to know if that’s a bgp funniness or.

00:11:30.300 –> 00:11:37.500
Andy Whiteside: i’m actually not trying Google see the Yahoo, in this case was 30 something milliseconds i’m pinging things that I found the.

00:11:37.800 –> 00:11:39.510
Andy Whiteside: Green oh along the way.

00:11:39.960 –> 00:11:40.650

00:11:45.090 –> 00:11:45.750
Andy Whiteside: oppressive hmm.

00:11:46.230 –> 00:11:47.790
Jirah Cox: I know the path thing is interesting.

00:11:48.000 –> 00:11:48.360
Jirah Cox: it’s like it’s.

00:11:49.320 –> 00:11:59.520
Andy Whiteside: crazy yeah it’s kinda like trace route, but it gives you I think I don’t know long time ago I learned it and thought it was the most amazing thing when I was using trace routes, I started using path beings.

00:11:59.730 –> 00:12:00.810
Harvey Green: hmm interesting.

00:12:01.890 –> 00:12:02.340
Andy Whiteside: Something.

00:12:02.520 –> 00:12:16.740
Jirah Cox: worth calling out like almost every time that you entertain this kind of design within text metro this isn’t going to be over you’re either private lines or a vpn tunnel or dark fiber you know you wouldn’t really do it over the real Internet.

00:12:16.860 –> 00:12:22.170
Harvey Green: yeah another good point you know you don’t want us to go outside your network and come back.

00:12:24.030 –> 00:12:27.450
Jirah Cox: Well, it would add a dramatic predictability to latency.

00:12:27.480 –> 00:12:28.860
Harvey Green: Exactly yeah.

00:12:29.730 –> 00:12:36.600
Andy Whiteside: So the very first time I experienced this, I had a university they had two data centers across the parking lot from each other.

00:12:37.110 –> 00:12:46.020
Andy Whiteside: And they had super low latency and they had redundant fiber that went out the back of the building out the front of the building through the under the parking lot around the parking lot.

00:12:46.560 –> 00:13:01.710
Andy Whiteside: And they actually didn’t deal with the metro cluster they ended up going with the cluster and they put two nodes in one building two nodes in the other and the nuttiness guys were looking really not happy with that, but they did it and it seems to have worked is that common.

00:13:02.520 –> 00:13:06.060
Jirah Cox: Not common not recommended glad it’s working out for them.

00:13:07.200 –> 00:13:10.530
Jirah Cox: But I would also postulate that’s probably untested failure.

00:13:12.870 –> 00:13:14.370
Jirah Cox: Primarily because.

00:13:15.630 –> 00:13:20.400
Jirah Cox: We never recommend you know with any distributed system right you wouldn’t want.

00:13:22.080 –> 00:13:29.070
Jirah Cox: a quorum violating amount of notes yeah to be able to be partitioned right or split brain.

00:13:29.310 –> 00:13:41.820
Jirah Cox: So if you have two and two and then they can’t communicate neither one really understands which one is the is the quorum and either those failures constitute like an an rf violating level of failures.

00:13:42.240 –> 00:13:55.500
Harvey Green: Right yeah if you’ve got to into, and one of those sites goes down, and now you have to have, for then you’re not in a good position to recover from anything I mean your.

00:13:56.460 –> 00:14:03.660
Harvey Green: I guess everything at that point will begin thrashing trying to figure out where the other side is because we need at least three.

00:14:04.590 –> 00:14:15.630
Jirah Cox: Well, or you nearly three, but also for node cluster you only configure rf to which is a lose one of anything type of scenario and instantly we’ve lost two so we’re kind of already in a in a bad way.

00:14:15.900 –> 00:14:23.070
Andy Whiteside: Yes, well and I guess the thing that made it slightly okay was the fact they had lots of redundancy which physically that might have been Okay, but.

00:14:23.400 –> 00:14:30.570
Andy Whiteside: Still, could have been a human error that could have wiped out that redundancy and next thing you know everything is going crazy thinking it’s, the only thing left on the planet.

00:14:30.840 –> 00:14:32.580
Harvey Green: Absolutely yeah.

00:14:33.480 –> 00:14:43.650
Jirah Cox: We wouldn’t really go crazy but, but I have no there’s the you know, whatever whichever one you call the surviving side here is like Okay, with it, and just keep on running of the pair of to.

00:14:44.040 –> 00:14:53.580
Jirah Cox: write that we’ll just we’ll just stop all operations wait until we can restore cluster health and then resume forward in time we’re actually you know new taxes remarkably resilient platform.

00:14:54.090 –> 00:15:05.850
Jirah Cox: In terms of like taking good care of customer data and and accommodating the unexpected but that accommodating can often be like we’ll just turtle up and safeguard your data until you heal the network that we get back up and running.

00:15:07.110 –> 00:15:10.050
Andy Whiteside: My life at home, something will happen everybody just starts away for dad to get home.

00:15:16.500 –> 00:15:18.300
Andy Whiteside: My kids my wife is gonna listen to this.

00:15:18.420 –> 00:15:19.020
Andy Whiteside: On nevermind.

00:15:20.130 –> 00:15:23.280
Harvey Green: yeah something something sort of literal split brain yeah.

00:15:23.970 –> 00:15:40.800
Harvey Green: yeah I was gonna say good for you, they turtle up and wait for you to get home i’m sure it is annoying but not as annoying as they continue to just go further and further down that rabbit hole So then, by the time I get to it it’s 10 times worse than it was when it started.

00:15:42.060 –> 00:15:45.090
Andy Whiteside: Honestly, my boys will definitely do that, like they.

00:15:46.980 –> 00:15:51.540
Andy Whiteside: just keep wrapping the string around the lawn mower blade until I just is unfixable.

00:15:53.610 –> 00:15:55.980
Harvey Green: yeah yes that’s what I expect.

00:15:58.260 –> 00:15:58.530
Andy Whiteside: Okay.

00:15:59.220 –> 00:16:12.570
Jirah Cox: So we’ve got our tunes unix clusters one in one cluster per site each of them has you know those those data stores those containers that are you know from me to you and from you to me replication created on both sides.

00:16:14.520 –> 00:16:18.660
Jirah Cox: Then we of course we want to add all nodes to a vmware cluster in the Center.

00:16:19.950 –> 00:16:30.660
Jirah Cox: One cluster right, so this is a key point we’re going to use to new tax clusters add them both to one single V Center cluster reason for that being is because, when reason he is excited as our.

00:16:31.200 –> 00:16:45.300
Jirah Cox: hypervisor the management and control plane dictates that the H a boundary is at the cluster level, so you can’t have V sphere ha from one cluster to another that’s all we want all nodes to be in one single these for cluster.

00:16:46.650 –> 00:16:57.030
Jirah Cox: So it’s it’s intuitive once you think about it, but some people might on their own think they want two clusters, which would be an accurate geographical representation, but we want where she went the opposite, we want one.

00:16:58.140 –> 00:17:02.250
Jirah Cox: One logical cluster with to participate in tax clusters within it.

00:17:04.650 –> 00:17:18.330
Jirah Cox: Once they’re in there there’s an accommodation here given for like how vcs gets placed right the way that these for does modern ha, that of course is fine doesn’t need any any real tinkering just create a bcs data store preach site, because those of us won’t migrate won’t move.

00:17:20.370 –> 00:17:31.470
Jirah Cox: Within that there’s some recommendations here on drs right basically leave drs fully automated threshold three, of course, no power management, no advanced options that’s all pretty straightforward.

00:17:32.820 –> 00:17:45.780
Jirah Cox: Some flags here around the PD handling for each host for how we want to handle at events ha handling of course monitor vm for availability on host failure restart vm.

00:17:47.160 –> 00:17:51.660
Jirah Cox: admission control failures right for most rfp clusters tolerate one failure.

00:17:52.740 –> 00:18:01.530
Jirah Cox: For data store heartbeat right so like the to metro data stores and then advanced options can also stay up to there as well, so now this point we’ve got.

00:18:03.510 –> 00:18:13.380
Jirah Cox: Our containers created our boutiques clusters created our single visa V Center cluster created and we’ve got our dear as an option set on both of those two clusters.

00:18:13.680 –> 00:18:17.250
Andy Whiteside: Entire, whereas the Center living in this world doesn’t matter.

00:18:17.580 –> 00:18:17.970

00:18:19.260 –> 00:18:29.610
Jirah Cox: Ideally, the books as the most right answer is it live somewhere else right we don’t want it to necessarily be on one of these clusters that it is protecting for drones live environment yeah he calls out that even does that.

00:18:30.840 –> 00:18:39.480
Jirah Cox: But there’s considerations for like an actual production environment where if you’re going to go do this for real for your day job the Center should go elsewhere right yeah and there’s actually a reason.

00:18:40.200 –> 00:18:53.280
Jirah Cox: we’ll get back to so we’ll also be creating later on a witness vm witness vm absolutely has to be outside the cluster that it’s protecting right so it’s going to be witnessing which of my to surviving surviving clusters is already.

00:18:54.870 –> 00:19:03.450
Jirah Cox: You know alive, which ones, my survivor which ones offline so we already have a third availability zone right for the witness V centers should go there as well, probably.

00:19:05.370 –> 00:19:13.350
Andy Whiteside: And tyra maybe we’re going into a section down below here where it talks more about the witness, is it a witness a witness looking at both clusters that works.

00:19:13.410 –> 00:19:18.930
Jirah Cox: Correct yep a witness both clusters both can see it both get get registered to it yep.

00:19:19.560 –> 00:19:32.820
Andy Whiteside: Is that does anybody mechanics offer that piece as a service like external or maybe I just came up with the business model, but he can that does new tannic software as a service or is that a vm that lives on the network somewhere.

00:19:33.480 –> 00:19:43.230
Jirah Cox: It is a vm that does live somewhere on your network in terms of as a service, I mean I saw the next clusters of the service that can run witness films.

00:19:44.220 –> 00:19:59.040
Jirah Cox: But also, in a less jokey way you can even earn that anywhere you want it to write you can run it on nani 10 X right on anything you wanted to you could run it on anything next cluster in public cloud anywhere you needed to you want to treat like a third availability zone yeah okay.

00:19:59.730 –> 00:20:04.650
Andy Whiteside: it’s not a lot to those things it sounds scary and then you do it one time I go that’s all I did.

00:20:04.950 –> 00:20:14.880
Jirah Cox: Now it’s very lightweight and all it is basically just you know i’m sure it is more than this basically just raise your heartbeats and then let someone know that it’s the survivor and if.

00:20:16.320 –> 00:20:17.280
Jirah Cox: If they can’t see each other.

00:20:17.670 –> 00:20:21.570
Andy Whiteside: it’s some type of virtual appliance right it’s not a windows box is that.

00:20:21.750 –> 00:20:22.890
Jirah Cox: Correct it’s a ritual blinds.

00:20:23.220 –> 00:20:25.380
Andy Whiteside: Not a container that’s Why would have virtual appliance.

00:20:25.920 –> 00:20:28.770
Jirah Cox: Correct also not containers virtual points okay.

00:20:31.200 –> 00:20:37.230
Jirah Cox: So then one additional bit of construct here right is going to be creating host groups and.

00:20:38.400 –> 00:20:47.460
Jirah Cox: affinity rules with envy Center so what that means is that we want to teach the Center right i’ve got host group one host group to.

00:20:48.000 –> 00:20:58.890
Jirah Cox: And then we’ll also then create some should rules with envy Center to say these vm should run on cluster one these idioms should run a cluster to because obviously a we want data locality be we don’t want.

00:20:59.850 –> 00:21:13.080
Jirah Cox: All the rights getting dragged across to the wrong cluster because each each storage data right each each data store is owned by one cluster or the other, so we want to make sure that those Games get preferential access to where the data actually lives in as readable.

00:21:13.290 –> 00:21:21.210
Andy Whiteside: And when you see, mostly in the field is it pretty much one clusters loaded up in the next other ones to illness thumbs waiting or the people kind of divide out the workloads there’s.

00:21:21.570 –> 00:21:25.200
Jirah Cox: You could do either one there is probably.

00:21:27.570 –> 00:21:36.210
Jirah Cox: Probably no real need to go for full and empty because statistically, I would hope that as you’re planning this as if, as a company.

00:21:36.630 –> 00:21:46.740
Jirah Cox: Your data centers i’ve hopefully equal risk right there’s not one that’s like more likely to go offline than the other, if that is the case, you know if if one you know goes down every third Tuesday for.

00:21:47.370 –> 00:21:52.290
Jirah Cox: Inexplicably reasons, then sure treat that as the empty one right and plan to run new workloads there.

00:21:53.490 –> 00:22:04.980
Jirah Cox: But but but, for the most part, most for the most part, most companies would prefer to run with like balance workloads right and most application servers will bring that way to write run half my domain controllers on one side half that the other half of my.

00:22:06.870 –> 00:22:18.270
Jirah Cox: You know web servers and one of the other deploy ag databases, with a leg on each side right so it’s better in most ways to split my workloads across those but you could do either one yeah.

00:22:19.410 –> 00:22:26.010
Andy Whiteside: I think the reality at least it has been up until recently, as you had the old building, there was the data Center he bought a new data Center.

00:22:26.280 –> 00:22:35.040
Andy Whiteside: Now the old building became the backup and it never should have been the data Center To begin with, but you know, all I can afford is one and, but now, with the world of co lows and cloud that has a better.

00:22:36.810 –> 00:22:49.050
Jirah Cox: yeah it’d be my hope, but it is just that I hope that you know yeah right with high quality modern colo is that having to two data centers neither which is inherently more risky than the other is can be an easy reality.

00:22:50.460 –> 00:22:56.370
Jirah Cox: So we’ve got our our rules in place that you know certain vm should run on on one side, certainly am should run at the other side.

00:22:59.100 –> 00:23:14.820
Jirah Cox: And then yeah we get here to deploy the witness, so we deploy that as a virtual appliance given an IP address, and then we within each each as closer register those clusters, with the witness vm so they both check in their heart beating to make sure that they are online.

00:23:15.930 –> 00:23:17.430
Jirah Cox: So, then, I lost it.

00:23:17.760 –> 00:23:18.990
Andy Whiteside: How often are they touching it.

00:23:19.440 –> 00:23:20.370
Jirah Cox: um I don’t know.

00:23:21.480 –> 00:23:23.040
Jirah Cox: Very often, it must be.

00:23:23.070 –> 00:23:24.090
Andy Whiteside: Like almost non stop.

00:23:27.660 –> 00:23:32.730
Jirah Cox: mean yeah I would assume yeah that’s got to be very, very frequent yeah okay.

00:23:33.870 –> 00:23:38.100
Jirah Cox: um you know mail a postcard once a week that should be fine right.

00:23:41.040 –> 00:23:47.760
Jirah Cox: So the we can deploy it in your lab if you want, you can then become packet sniffing watch how often those things call each other.

00:23:49.890 –> 00:23:51.450
Andy Whiteside: And why you would hard, we can do that and let me know.

00:23:53.640 –> 00:23:55.500
Jirah Cox: So you’re not that curious okay interesting.

00:23:56.340 –> 00:23:57.240
Andy Whiteside: have other things.

00:23:57.420 –> 00:23:59.160
Andy Whiteside: Of course we all do at this point.

00:23:59.190 –> 00:24:07.860
Andy Whiteside: But maybe a was the guy’s name had a really cool name guy who wrote the blog drum your own list is the restaurant i’m sure he knows.

00:24:08.010 –> 00:24:08.430
Jirah Cox: There you go.

00:24:09.810 –> 00:24:24.990
Jirah Cox: So, the last thing to do, then, is a turn an application right so we’re going to replicate and we named our our containers running meaningfully right from one to two and from two to one we’re going to go in and they were application in that direction named in the in the container.

00:24:26.340 –> 00:24:35.370
Jirah Cox: And that’s it that’s the, that is the end of the setup process with talk you through it in the span of a single podcast episode So then, what does this look you do.

00:24:36.330 –> 00:24:41.430
Jirah Cox: For one right, of course, you can now live migrate across clusters right because the motion within a data store.

00:24:41.940 –> 00:24:49.890
Jirah Cox: The much within a cluster is super easy to do with the V Center level so that lets you get your live migration, and this is where where we like we open with.

00:24:50.370 –> 00:25:01.710
Jirah Cox: scenarios, where you can do planned workload migration to avoid say plan to data Center maintenance, so that would be easy right and now we’re back to rerouting around traffic on the way to school.

00:25:03.660 –> 00:25:14.970
Jirah Cox: So with that I can you know turn off either update my drs rules or turn off ERS project updating the rule is better update my role to save these vm should run at the other site.

00:25:16.410 –> 00:25:18.810
Jirah Cox: don’t migrate over, and then I can activate their.

00:25:19.830 –> 00:25:31.950
Jirah Cox: Their container right, so I can take the data store named one to an activated at site to to make it readable there so that i’ve moved all my workloads into one basket and now I can.

00:25:32.730 –> 00:25:47.730
Jirah Cox: Do maintenance packet data Center one yeah and then I can move them back over so i’ve got my my live migration across data centers they’re pretty easy and all i’m really doing is touching my drs rules there for where that should rule will place my vm.

00:25:48.840 –> 00:25:53.700
Jirah Cox: But then the other real outcome that most customers are going to look at this for is that you know.

00:25:54.600 –> 00:26:04.470
Jirah Cox: The smoking whole data Center goes dark at three in the morning kind of scenario and my vm Czar and ha event away from recovering at the alternate data Center so that’s sort of a.

00:26:04.800 –> 00:26:07.950
Jirah Cox: You know no human touch is a button, but all the workloads come back up.

00:26:08.730 –> 00:26:21.210
Jirah Cox: is usually that’s really the the plan you know there’s some migration maintenance is a nice to have but, really, they must have is the no data loss all data written across to another data Center before it.

00:26:21.990 –> 00:26:28.380
Jirah Cox: Ever gets acknowledged and then auto workload result resuming across the wire there at the other data Center or the real.

00:26:30.090 –> 00:26:31.140
Jirah Cox: Huge deliverables.

00:26:31.740 –> 00:26:41.430
Andy Whiteside: So harvey’s probably had more interaction with this and I have most of my world has been around vdi and building, you know non persistent on one side nonprofits, on the other, but have control over your control over there.

00:26:41.820 –> 00:26:45.480
Andy Whiteside: And if one went down all the other stuff just kept going I ran a half my load but.

00:26:45.780 –> 00:26:52.230
Andy Whiteside: I still had something available now where it really becomes interesting on the vdi world is all those persistent desktops that.

00:26:52.530 –> 00:27:07.560
Andy Whiteside: You know they can’t live in both places same time this all of a sudden solve that challenge, and there are lots of rollouts where they have hundreds if not thousands of persistent vdi workloads that need to need to be available have if one day that’s what’s other cluster was gone.

00:27:08.730 –> 00:27:09.390
metric plus.

00:27:12.120 –> 00:27:22.290
Andy Whiteside: or vm this one’s always been a fascinating to me so i’ve kind of dominate any specific comments or questions or thoughts on metro clusters and what you’ve seen.

00:27:23.700 –> 00:27:42.000
Harvey Green: I mean, for me, is what I guess what I would have expected, I mean not not a lot of last week shouldn’t say, not a lot, there are lots of customers who do get this kind of flexibility, because they have those sites that are geographically.

00:27:43.860 –> 00:27:51.300
Harvey Green: Geographically available for this something that’s this open enough or fast enough or low enough latency that they can pull this off.

00:27:52.710 –> 00:28:11.160
Harvey Green: It doesn’t happen for everybody, so it’s not something that i’d say I run into very often but it’s definitely something that i’ve run into often enough that you know it’s it’s good to know the ins and outs behind it and how you can set it up to test like Jerome did.

00:28:12.450 –> 00:28:13.710
Jirah Cox: And some of the key like.

00:28:13.920 –> 00:28:24.630
Jirah Cox: What is it not call out like you don’t need specific node types, not a ton of virtual appliances to deploy it’s the one witness right and beyond that you’re just using regular detect clusters to do it so.

00:28:24.660 –> 00:28:32.550
Andy Whiteside: Direct that brings up a couple things, let me ask, let me ask the one question just popped in my head I guess you got to have at least the same amount of storage capacity on both sides.

00:28:32.910 –> 00:28:39.810
Andy Whiteside: If you’re going to be going from each direction towards each other, you got to have enough storage to cover that replication right.

00:28:39.990 –> 00:28:45.030
Jirah Cox: I don’t know if it’s a hard requirement but boy I would you recommend it yeah otherwise like it’s pretty interesting yeah.

00:28:45.330 –> 00:28:48.780
Andy Whiteside: And then we talked all through this article this blog about doing it with.

00:28:49.140 –> 00:28:50.400
Andy Whiteside: vmware it probably just.

00:28:50.400 –> 00:28:54.570
Andy Whiteside: gets us even slightly easier if not a lot easier on HIV or not.

00:28:55.080 –> 00:29:09.090
Jirah Cox: Correct yeah because in this one right we’re sort of using the concert and the Center of ha is a cluster property eugenics already knows when it’s managing an HP environment that each cluster you know, has its own hga control plane.

00:29:10.170 –> 00:29:18.270
Jirah Cox: So the steps, a little bit different set it up, but yeah the vm recovery automatically across synchro on HP is also possible yeah cool.

00:29:19.440 –> 00:29:25.110
Andy Whiteside: Well guys, I think that wraps it up, this is, this is very interesting this This, to me, is a little more interesting than red hat open shift, I have to admit.

00:29:27.900 –> 00:29:28.230
Andy Whiteside: i’m.

00:29:28.260 –> 00:29:29.190
Jirah Cox: Sorry redhead.

00:29:30.420 –> 00:29:34.740
Andy Whiteside: it’s good stuff it’s just not how my you know where my where my passions been.

00:29:36.360 –> 00:29:41.220
Andy Whiteside: There was a brief like three months, there were, I tried to become a Linux server guy in the early 2000s.

00:29:42.390 –> 00:29:58.710
Andy Whiteside: That in a unix guy and then, then I got the way I didn’t I did not make it I ended up back in the windows in the vdi world I thought was more fun, but I get it, I love getting down deep into Colonel and commanding myself away.

00:30:00.000 –> 00:30:04.080
Jirah Cox: And both are important, you could use this to protect your red hat open shift environment for sure sure.

00:30:05.550 –> 00:30:10.590
Andy Whiteside: Well guys happy Monday and thanks for joining, and this was fun and we’ll we’ll do it again a week.

00:30:11.430 –> 00:30:12.000
Jirah Cox: Every Wednesday.

00:30:12.960 –> 00:30:13.440
Harvey Green: See you guys.