53: Nutanix Weekly: Step-by-Step Guide to Deploying Nutanix Metro Availability

May 11, 2022

Host: Andy Whiteside
Co-host: Harvey Green
Co-host: Jirah Cox

WEBVTT

1
00:00:02.520 –> 00:00:06.629
Andy Whiteside: Everyone welcome to episode 53 of mechanics weekly i’m your host Andy whiteside got.

2
00:00:07.589 –> 00:00:24.630
Andy Whiteside: Harvey green with me, who is kind of local he’s a I don’t know 45 minutes away by car and gyros now two and a half, three hours away by car, we were just debating define local in 2022 Harvey was your answer to how you to find local in 2022.

3
00:00:25.080 –> 00:00:38.130
Harvey Green: I told jarrod that his mother, the latency between where he is and where I am round trip in my car isn’t what it used to be at all so he’s not as local as he used to be yeah.

4
00:00:38.880 –> 00:00:45.180
Jirah Cox: If I recall my high school physics i’ve been saying i’m two and a half to three hours away make certain assumptions about speed.

5
00:00:46.410 –> 00:00:46.890
Andy Whiteside: Well yeah.

6
00:00:47.550 –> 00:00:49.290
Harvey Green: I defined in my car.

7
00:00:51.120 –> 00:01:01.110
Andy Whiteside: Well, and you’re really saying how fast, can I get to you in this case, get there and get back and acknowledge somebody acknowledged that you got they’re not in this is really good for what we’re gonna talk about here in a second.

8
00:01:03.240 –> 00:01:10.020
Andy Whiteside: But you know so Harvey i’m not doing it today, but let’s say able to leave my office and go pick up my kids at school, I know exactly how to get there.

9
00:01:10.320 –> 00:01:18.960
Andy Whiteside: Yes, but because I don’t know what traffic is going to be like where I live, where you live it’s a nightmare i’m sure where I live, now probably a nightmare to yeah um.

10
00:01:19.590 –> 00:01:25.170
Andy Whiteside: You know I don’t know how to get there, because I don’t know what the traffic situation is going to be, therefore, I use smart software.

11
00:01:25.500 –> 00:01:34.680
Andy Whiteside: Okay, like an sd way and type of thing, but in this case I use GPS and it’s in my car I turn it on, even though I know where i’m going and occasion it’s like oh no don’t go that way you need.

12
00:01:34.680 –> 00:01:35.040
Right.

13
00:01:36.240 –> 00:01:40.290
Andy Whiteside: I gotta be stubborn oh i’m going this way and I go the way I know and like oh no car accident.

14
00:01:40.500 –> 00:01:41.970
Three hours sitting in traffic.

15
00:01:43.800 –> 00:01:51.420
Andy Whiteside: Which kind of maybe takes us to our topic for today and and I asked about we were talking about stretch clusters, the other week and I brought up.

16
00:01:52.770 –> 00:01:54.450
Andy Whiteside: metro know was a call.

17
00:01:57.120 –> 00:02:07.800
Andy Whiteside: And gyro has brought that topic for today from the new tannic Community blogs, step by step guides deploy new tactics metro availability.

18
00:02:08.280 –> 00:02:17.970
Jirah Cox: Yes, a drone wrote up a fantastic blog post here I love, how are our Community blog engine assigned some the label of trendsetter fantastic.

19
00:02:18.750 –> 00:02:33.450
Jirah Cox: career as your own on your trend setter status and yeah I think another fun connection to make here between you driving somewhere you’ve already learned how to get to, but the value is the software he called out, but also the the data right the real time data of like.

20
00:02:34.530 –> 00:02:46.020
Jirah Cox: How are the roads running right and what do I want to avoid as I get from A to B there has a lot of applicability right, because we can use our we can use a metro cluster obviously for.

21
00:02:47.040 –> 00:02:56.910
Jirah Cox: worst case scenarios, but there’s value in even not all the way to worst case scenarios as well right so like we can use it for like say planned maintenance right if I know that I need to.

22
00:02:59.730 –> 00:03:08.670
Jirah Cox: get ahead of like say generator maintenance or transfer switch or ups maintenance, maybe I want to go ahead and move workloads over to other data Center anyway right rather than.

23
00:03:10.050 –> 00:03:15.030
Jirah Cox: You know, trust that ups maintenance is going to go totally totally smoothly yeah.

24
00:03:16.170 –> 00:03:23.910
Andy Whiteside: The goal is to get there and get there as fast as you can, but you want to get there in one piece so getting there is the most important part.

25
00:03:24.150 –> 00:03:26.370
Andy Whiteside: yeah we have become overly.

26
00:03:27.540 –> 00:03:37.320
Andy Whiteside: Overly familiar with the idea of the acceptance that we have to get there fast and that’s today’s age, but getting there getting there all in one piece.

27
00:03:38.730 –> 00:03:39.480
Harvey Green: Yes.

28
00:03:40.500 –> 00:03:43.290
Harvey Green: Totally oh packet or limb loss.

29
00:03:49.170 –> 00:03:50.760
Jirah Cox: I can’t agree yeah.

30
00:03:54.210 –> 00:03:56.580
Jirah Cox: This podcast is officially against limb loss.

31
00:03:56.610 –> 00:03:58.890
Jirah Cox: Yes, i’ll plant that flag.

32
00:03:59.850 –> 00:04:01.590
Andy Whiteside: No blood and guts and Gore.

33
00:04:02.880 –> 00:04:03.510
Harvey Green: that’s right.

34
00:04:03.690 –> 00:04:04.200
Jirah Cox: So to.

35
00:04:04.500 –> 00:04:05.910
Andy Whiteside: me no data loss either, though.

36
00:04:06.480 –> 00:04:07.500
Jirah Cox: Totally no data loss.

37
00:04:07.680 –> 00:04:08.670
Jirah Cox: Know James what data loss.

38
00:04:08.970 –> 00:04:15.150
Jirah Cox: So to start painting a word picture here right in drums drums right up here, I have is.

39
00:04:15.810 –> 00:04:31.740
Jirah Cox: I think this is a lab deployment so start off with two new tonics clusters, so we want one cluster in like site a or availability zone, a or data Center a another cluster in sight to availability zone to data Center to what uh however you break them up.

40
00:04:32.790 –> 00:04:45.030
Jirah Cox: The entire point or at the end state of deploying and you say next metro cluster which is really tuning to X clusters and then and then one in this case, this is using V sphere as the hypervisor which is not required, you can do it on HP as well.

41
00:04:46.440 –> 00:04:54.660
Jirah Cox: Is the outcome of you know all all the humans are asleep at three in the morning one data Center goes fully dark loses power let’s say.

42
00:04:55.590 –> 00:05:06.390
Jirah Cox: All I want all workloads to automatically resume at the other data Center and to your point Harvey what zero data loss right so just as if the vm is crashed at one site.

43
00:05:06.900 –> 00:05:13.050
Jirah Cox: And they will power, on the other side, as if it was a DJ event from node to node within one cluster but we’re going to cross.

44
00:05:13.380 –> 00:05:22.980
Jirah Cox: A larger geography than just within one cluster right we’re not going from blade to blade node to node rack Rack, this is going to go from data Center at the data Center be right is the.

45
00:05:24.420 –> 00:05:28.920
Jirah Cox: expected and desired outcome so to do that we’ve got our clusters that each site.

46
00:05:29.940 –> 00:05:37.050
Jirah Cox: Because each cluster has an autonomous unit right so i’m going to write in this case, probably two copies of data and cluster a two copies of data and cluster B.

47
00:05:37.620 –> 00:05:44.880
Jirah Cox: keep them in sync and then with synchronous replication right no data loss I wait for that remote acknowledgement.

48
00:05:45.300 –> 00:05:57.930
Jirah Cox: So if a vm is running it site a I wait for site be to tell me hey i’ve got that data as well before I acknowledge back to the guest vm hey i’ve got your data right, so the right doesn’t succeed, to the guest os until we’ve got data written on both clusters.

49
00:05:59.130 –> 00:06:00.480
Jirah Cox: So, within each cluster then.

50
00:06:01.860 –> 00:06:12.600
Jirah Cox: You know, then on each cluster I create a local data store basically or a data store that goes from from data store eight site A to B and then it’s i’d be I created a data store container.

51
00:06:12.960 –> 00:06:21.240
Jirah Cox: From site beat a so i’ve got both my both my directions their journeys the the the nomenclature metro one dash two and metro to dash one.

52
00:06:22.290 –> 00:06:35.760
Jirah Cox: But basically it’s It shows in the labeling right for a human to see these are vm that replicate from one site to another site and what the direction is there, so we can keep a straight as we’re like building a new vm in our vm creation wizard so.

53
00:06:36.090 –> 00:06:47.490
Andy Whiteside: i’ve got cluster number one in building one and then how far away can cluster number to be to do this and how much redundancy do I have to have.

54
00:06:48.600 –> 00:06:49.200
Jirah Cox: So.

55
00:06:49.320 –> 00:06:50.790
Jirah Cox: The next person is not set.

56
00:06:50.880 –> 00:07:02.250
Jirah Cox: By how far away, is not a factor of distance factor of latency book says we want to be five milliseconds away or less in reality we want to be as close to the or less as possible.

57
00:07:03.420 –> 00:07:06.630
Jirah Cox: If you told me that the link has four milliseconds of latency from end to end.

58
00:07:07.110 –> 00:07:20.970
Jirah Cox: Then that’s four milliseconds you’re going to add to each and every single right for the life of the cluster right you’re slowing down all vm right operations by that delay, so we want that to be as low as possible zero milliseconds one millisecond as much, much better.

59
00:07:21.240 –> 00:07:34.140
Harvey Green: So that that’s very, very important that you bring that up because you know, one of the things that we talked about a lot with 10 X is the acknowledgement of rights and the kind of women to this a little bit too, but.

60
00:07:34.680 –> 00:07:52.440
Harvey Green: You want to make sure that you’ve got the optimal set and that you’re not killing yourself performance wise to try to make this work, because you actually are waiting on that latency to take place before those rights are acknowledged as I write.

61
00:07:52.830 –> 00:08:03.090
Jirah Cox: 100% yeah yeah if a vm lives inside one and you’ve configured it for synchronous replication decided to then it’s going to wait until site to says i’ve got that data, before I can proceed.

62
00:08:03.900 –> 00:08:19.890
Harvey Green: yeah so for for all the people who haven’t been doing this very long they say well for a millisecond that’s nothing I got you know 30 or 40 or 50 or 60 between this Simon this next slide Why would I care about for.

63
00:08:22.320 –> 00:08:24.450
Jirah Cox: Good because it’s a lot for your storage system.

64
00:08:25.200 –> 00:08:30.480
Jirah Cox: started adding 60 milliseconds to everything you did on your laptop you would be negative one phone call.

65
00:08:31.560 –> 00:08:32.940
Jirah Cox: or or looking for a new laptop.

66
00:08:33.660 –> 00:08:40.080
Harvey Green: Right so that’s that’s for the I guess the the big answer behind that is that for a millisecond every right.

67
00:08:40.410 –> 00:08:42.600
Jirah Cox: Every single right.

68
00:08:42.780 –> 00:08:48.300
Andy Whiteside: So that’s what we really talk in fiber configurations, in order to get this done so, we do have a copper these days.

69
00:08:49.200 –> 00:08:58.770
Jirah Cox: The medium doesn’t matter too much well to be at any kind of meaningful distance you probably will be fiber away, you could I sure you could use copper technically to be properly agnostic.

70
00:09:00.390 –> 00:09:06.840
Jirah Cox: It almost becomes a matter of function of availability planning right and like shared nothing so.

71
00:09:07.290 –> 00:09:17.070
Jirah Cox: Could I get from like a data Center hall to data Center hall like we have some regional data centers around here that have multiple kind of shared nothing availability zones within one roof.

72
00:09:18.420 –> 00:09:28.320
Jirah Cox: And if you want to do that, of course, you totally could but normally most customers will be looking to go like across the city, maybe across the state, depending on what size of state you’re sitting in when you listen to this.

73
00:09:29.610 –> 00:09:31.740
Jirah Cox: New England, yes, Texas, maybe not.

74
00:09:33.090 –> 00:09:34.230
Jirah Cox: in general.

75
00:09:34.320 –> 00:09:38.040
Jirah Cox: 50 ish miles i’ve heard is kind of a typical translation of.

76
00:09:39.180 –> 00:09:42.870
Jirah Cox: The kind of low latency you’d want to to build this kind of a construct on top of.

77
00:09:45.270 –> 00:09:48.090
Andy Whiteside: And that that aren’t that rf so right.

78
00:09:49.110 –> 00:09:51.570
Andy Whiteside: redundancy factor of two.

79
00:09:51.630 –> 00:09:52.350
Harvey Green: To yeah.

80
00:09:52.590 –> 00:09:53.220
Andy Whiteside: Now, would you.

81
00:09:53.250 –> 00:09:56.580
Andy Whiteside: Typically, have a say you got to have at least redundancy factor of two, on the other.

82
00:09:56.580 –> 00:09:59.640
Andy Whiteside: side to courage and are we waiting on both thumbs right you’re just one of them.

83
00:10:00.930 –> 00:10:03.390
Jirah Cox: Both but they occur, you know simultaneously.

84
00:10:04.830 –> 00:10:15.480
Harvey Green: yeah so that that latency we’re talking about potentially that formula seconds only has to happen once and then, once it gets to the other side it’s writing that second right locally.

85
00:10:16.260 –> 00:10:20.610
Jirah Cox: Because don’t like in some ways don’t overthink it like it’s the cluster.

86
00:10:20.910 –> 00:10:25.980
Jirah Cox: So my data to the other cluster the cluster won’t acknowledge it until it right is compliant with the rf.

87
00:10:27.390 –> 00:10:30.870
Jirah Cox: Right, so you sort of are waiting on both be related in the cluster to acknowledge it.

88
00:10:31.770 –> 00:10:39.510
Andy Whiteside: Okay, so it’s the clusters, the brain of a quick question for you, it doesn’t miss I did a quick search, we have to have prison central to pull this off for now.

89
00:10:39.660 –> 00:10:41.400
Jirah Cox: We actually do.

90
00:10:42.660 –> 00:10:51.120
Jirah Cox: not really for this one as as drones are up in his lab yeah not not fully required recommended as a lot of value to every environment yeah.

91
00:10:52.650 –> 00:11:02.070
Andy Whiteside: If you can see what i’m doing I did I did the path being out yahoo.com and i’m working my way down the chain to start seeing you know more than four seconds seconds and they’ll let.

92
00:11:03.210 –> 00:11:21.600
Jirah Cox: Well, so remember that often there’s some black magic going on there right if you say it in Google 8888 right or 888844 either one that’s a magical IP that like can have over responder closer to my house then say California.

93
00:11:21.900 –> 00:11:22.260
Jirah Cox: Right.

94
00:11:22.320 –> 00:11:23.670
Jirah Cox: So there’s some there’s some.

95
00:11:26.550 –> 00:11:29.580
Jirah Cox: I don’t know enough to know if that’s a bgp funniness or.

96
00:11:30.300 –> 00:11:37.500
Andy Whiteside: i’m actually not trying Google see the Yahoo, in this case was 30 something milliseconds i’m pinging things that I found the.

97
00:11:37.800 –> 00:11:39.510
Andy Whiteside: Green oh along the way.

98
00:11:39.960 –> 00:11:40.650
yeah.

99
00:11:45.090 –> 00:11:45.750
Andy Whiteside: oppressive hmm.

100
00:11:46.230 –> 00:11:47.790
Jirah Cox: I know the path thing is interesting.

101
00:11:48.000 –> 00:11:48.360
Jirah Cox: it’s like it’s.

102
00:11:49.320 –> 00:11:59.520
Andy Whiteside: crazy yeah it’s kinda like trace route, but it gives you I think I don’t know long time ago I learned it and thought it was the most amazing thing when I was using trace routes, I started using path beings.

103
00:11:59.730 –> 00:12:00.810
Harvey Green: hmm interesting.

104
00:12:01.890 –> 00:12:02.340
Andy Whiteside: Something.

105
00:12:02.520 –> 00:12:16.740
Jirah Cox: worth calling out like almost every time that you entertain this kind of design within text metro this isn’t going to be over you’re either private lines or a vpn tunnel or dark fiber you know you wouldn’t really do it over the real Internet.

106
00:12:16.860 –> 00:12:22.170
Harvey Green: yeah another good point you know you don’t want us to go outside your network and come back.

107
00:12:24.030 –> 00:12:27.450
Jirah Cox: Well, it would add a dramatic predictability to latency.

108
00:12:27.480 –> 00:12:28.860
Harvey Green: Exactly yeah.

109
00:12:29.730 –> 00:12:36.600
Andy Whiteside: So the very first time I experienced this, I had a university they had two data centers across the parking lot from each other.

110
00:12:37.110 –> 00:12:46.020
Andy Whiteside: And they had super low latency and they had redundant fiber that went out the back of the building out the front of the building through the under the parking lot around the parking lot.

111
00:12:46.560 –> 00:13:01.710
Andy Whiteside: And they actually didn’t deal with the metro cluster they ended up going with the cluster and they put two nodes in one building two nodes in the other and the nuttiness guys were looking really not happy with that, but they did it and it seems to have worked is that common.

112
00:13:02.520 –> 00:13:06.060
Jirah Cox: Not common not recommended glad it’s working out for them.

113
00:13:07.200 –> 00:13:10.530
Jirah Cox: But I would also postulate that’s probably untested failure.

114
00:13:12.870 –> 00:13:14.370
Jirah Cox: Primarily because.

115
00:13:15.630 –> 00:13:20.400
Jirah Cox: We never recommend you know with any distributed system right you wouldn’t want.

116
00:13:22.080 –> 00:13:29.070
Jirah Cox: a quorum violating amount of notes yeah to be able to be partitioned right or split brain.

117
00:13:29.310 –> 00:13:41.820
Jirah Cox: So if you have two and two and then they can’t communicate neither one really understands which one is the is the quorum and either those failures constitute like an an rf violating level of failures.

118
00:13:42.240 –> 00:13:55.500
Harvey Green: Right yeah if you’ve got to into, and one of those sites goes down, and now you have to have, for then you’re not in a good position to recover from anything I mean your.

119
00:13:56.460 –> 00:14:03.660
Harvey Green: I guess everything at that point will begin thrashing trying to figure out where the other side is because we need at least three.

120
00:14:04.590 –> 00:14:15.630
Jirah Cox: Well, or you nearly three, but also for node cluster you only configure rf to which is a lose one of anything type of scenario and instantly we’ve lost two so we’re kind of already in a in a bad way.

121
00:14:15.900 –> 00:14:23.070
Andy Whiteside: Yes, well and I guess the thing that made it slightly okay was the fact they had lots of redundancy which physically that might have been Okay, but.

122
00:14:23.400 –> 00:14:30.570
Andy Whiteside: Still, could have been a human error that could have wiped out that redundancy and next thing you know everything is going crazy thinking it’s, the only thing left on the planet.

123
00:14:30.840 –> 00:14:32.580
Harvey Green: Absolutely yeah.

124
00:14:33.480 –> 00:14:43.650
Jirah Cox: We wouldn’t really go crazy but, but I have no there’s the you know, whatever whichever one you call the surviving side here is like Okay, with it, and just keep on running of the pair of to.

125
00:14:44.040 –> 00:14:53.580
Jirah Cox: write that we’ll just we’ll just stop all operations wait until we can restore cluster health and then resume forward in time we’re actually you know new taxes remarkably resilient platform.

126
00:14:54.090 –> 00:15:05.850
Jirah Cox: In terms of like taking good care of customer data and and accommodating the unexpected but that accommodating can often be like we’ll just turtle up and safeguard your data until you heal the network that we get back up and running.

127
00:15:07.110 –> 00:15:10.050
Andy Whiteside: My life at home, something will happen everybody just starts away for dad to get home.

128
00:15:16.500 –> 00:15:18.300
Andy Whiteside: My kids my wife is gonna listen to this.

129
00:15:18.420 –> 00:15:19.020
Andy Whiteside: On nevermind.

130
00:15:20.130 –> 00:15:23.280
Harvey Green: yeah something something sort of literal split brain yeah.

131
00:15:23.970 –> 00:15:40.800
Harvey Green: yeah I was gonna say good for you, they turtle up and wait for you to get home i’m sure it is annoying but not as annoying as they continue to just go further and further down that rabbit hole So then, by the time I get to it it’s 10 times worse than it was when it started.

132
00:15:42.060 –> 00:15:45.090
Andy Whiteside: Honestly, my boys will definitely do that, like they.

133
00:15:46.980 –> 00:15:51.540
Andy Whiteside: just keep wrapping the string around the lawn mower blade until I just is unfixable.

134
00:15:53.610 –> 00:15:55.980
Harvey Green: yeah yes that’s what I expect.

135
00:15:58.260 –> 00:15:58.530
Andy Whiteside: Okay.

136
00:15:59.220 –> 00:16:12.570
Jirah Cox: So we’ve got our tunes unix clusters one in one cluster per site each of them has you know those those data stores those containers that are you know from me to you and from you to me replication created on both sides.

137
00:16:14.520 –> 00:16:18.660
Jirah Cox: Then we of course we want to add all nodes to a vmware cluster in the Center.

138
00:16:19.950 –> 00:16:30.660
Jirah Cox: One cluster right, so this is a key point we’re going to use to new tax clusters add them both to one single V Center cluster reason for that being is because, when reason he is excited as our.

139
00:16:31.200 –> 00:16:45.300
Jirah Cox: hypervisor the management and control plane dictates that the H a boundary is at the cluster level, so you can’t have V sphere ha from one cluster to another that’s all we want all nodes to be in one single these for cluster.

140
00:16:46.650 –> 00:16:57.030
Jirah Cox: So it’s it’s intuitive once you think about it, but some people might on their own think they want two clusters, which would be an accurate geographical representation, but we want where she went the opposite, we want one.

141
00:16:58.140 –> 00:17:02.250
Jirah Cox: One logical cluster with to participate in tax clusters within it.

142
00:17:04.650 –> 00:17:18.330
Jirah Cox: Once they’re in there there’s an accommodation here given for like how vcs gets placed right the way that these for does modern ha, that of course is fine doesn’t need any any real tinkering just create a bcs data store preach site, because those of us won’t migrate won’t move.

143
00:17:20.370 –> 00:17:31.470
Jirah Cox: Within that there’s some recommendations here on drs right basically leave drs fully automated threshold three, of course, no power management, no advanced options that’s all pretty straightforward.

144
00:17:32.820 –> 00:17:45.780
Jirah Cox: Some flags here around the PD handling for each host for how we want to handle at events ha handling of course monitor vm for availability on host failure restart vm.

145
00:17:47.160 –> 00:17:51.660
Jirah Cox: admission control failures right for most rfp clusters tolerate one failure.

146
00:17:52.740 –> 00:18:01.530
Jirah Cox: For data store heartbeat right so like the to metro data stores and then advanced options can also stay up to there as well, so now this point we’ve got.

147
00:18:03.510 –> 00:18:13.380
Jirah Cox: Our containers created our boutiques clusters created our single visa V Center cluster created and we’ve got our dear as an option set on both of those two clusters.

148
00:18:13.680 –> 00:18:17.250
Andy Whiteside: Entire, whereas the Center living in this world doesn’t matter.

149
00:18:17.580 –> 00:18:17.970
So.

150
00:18:19.260 –> 00:18:29.610
Jirah Cox: Ideally, the books as the most right answer is it live somewhere else right we don’t want it to necessarily be on one of these clusters that it is protecting for drones live environment yeah he calls out that even does that.

151
00:18:30.840 –> 00:18:39.480
Jirah Cox: But there’s considerations for like an actual production environment where if you’re going to go do this for real for your day job the Center should go elsewhere right yeah and there’s actually a reason.

152
00:18:40.200 –> 00:18:53.280
Jirah Cox: we’ll get back to so we’ll also be creating later on a witness vm witness vm absolutely has to be outside the cluster that it’s protecting right so it’s going to be witnessing which of my to surviving surviving clusters is already.

153
00:18:54.870 –> 00:19:03.450
Jirah Cox: You know alive, which ones, my survivor which ones offline so we already have a third availability zone right for the witness V centers should go there as well, probably.

154
00:19:05.370 –> 00:19:13.350
Andy Whiteside: And tyra maybe we’re going into a section down below here where it talks more about the witness, is it a witness a witness looking at both clusters that works.

155
00:19:13.410 –> 00:19:18.930
Jirah Cox: Correct yep a witness both clusters both can see it both get get registered to it yep.

156
00:19:19.560 –> 00:19:32.820
Andy Whiteside: Is that does anybody mechanics offer that piece as a service like external or maybe I just came up with the business model, but he can that does new tannic software as a service or is that a vm that lives on the network somewhere.

157
00:19:33.480 –> 00:19:43.230
Jirah Cox: It is a vm that does live somewhere on your network in terms of as a service, I mean I saw the next clusters of the service that can run witness films.

158
00:19:44.220 –> 00:19:59.040
Jirah Cox: But also, in a less jokey way you can even earn that anywhere you want it to write you can run it on nani 10 X right on anything you wanted to you could run it on anything next cluster in public cloud anywhere you needed to you want to treat like a third availability zone yeah okay.

159
00:19:59.730 –> 00:20:04.650
Andy Whiteside: it’s not a lot to those things it sounds scary and then you do it one time I go that’s all I did.

160
00:20:04.950 –> 00:20:14.880
Jirah Cox: Now it’s very lightweight and all it is basically just you know i’m sure it is more than this basically just raise your heartbeats and then let someone know that it’s the survivor and if.

161
00:20:16.320 –> 00:20:17.280
Jirah Cox: If they can’t see each other.

162
00:20:17.670 –> 00:20:21.570
Andy Whiteside: it’s some type of virtual appliance right it’s not a windows box is that.

163
00:20:21.750 –> 00:20:22.890
Jirah Cox: Correct it’s a ritual blinds.

164
00:20:23.220 –> 00:20:25.380
Andy Whiteside: Not a container that’s Why would have virtual appliance.

165
00:20:25.920 –> 00:20:28.770
Jirah Cox: Correct also not containers virtual points okay.

166
00:20:31.200 –> 00:20:37.230
Jirah Cox: So then one additional bit of construct here right is going to be creating host groups and.

167
00:20:38.400 –> 00:20:47.460
Jirah Cox: affinity rules with envy Center so what that means is that we want to teach the Center right i’ve got host group one host group to.

168
00:20:48.000 –> 00:20:58.890
Jirah Cox: And then we’ll also then create some should rules with envy Center to say these vm should run on cluster one these idioms should run a cluster to because obviously a we want data locality be we don’t want.

169
00:20:59.850 –> 00:21:13.080
Jirah Cox: All the rights getting dragged across to the wrong cluster because each each storage data right each each data store is owned by one cluster or the other, so we want to make sure that those Games get preferential access to where the data actually lives in as readable.

170
00:21:13.290 –> 00:21:21.210
Andy Whiteside: And when you see, mostly in the field is it pretty much one clusters loaded up in the next other ones to illness thumbs waiting or the people kind of divide out the workloads there’s.

171
00:21:21.570 –> 00:21:25.200
Jirah Cox: You could do either one there is probably.

172
00:21:27.570 –> 00:21:36.210
Jirah Cox: Probably no real need to go for full and empty because statistically, I would hope that as you’re planning this as if, as a company.

173
00:21:36.630 –> 00:21:46.740
Jirah Cox: Your data centers i’ve hopefully equal risk right there’s not one that’s like more likely to go offline than the other, if that is the case, you know if if one you know goes down every third Tuesday for.

174
00:21:47.370 –> 00:21:52.290
Jirah Cox: Inexplicably reasons, then sure treat that as the empty one right and plan to run new workloads there.

175
00:21:53.490 –> 00:22:04.980
Jirah Cox: But but but, for the most part, most for the most part, most companies would prefer to run with like balance workloads right and most application servers will bring that way to write run half my domain controllers on one side half that the other half of my.

176
00:22:06.870 –> 00:22:18.270
Jirah Cox: You know web servers and one of the other deploy ag databases, with a leg on each side right so it’s better in most ways to split my workloads across those but you could do either one yeah.

177
00:22:19.410 –> 00:22:26.010
Andy Whiteside: I think the reality at least it has been up until recently, as you had the old building, there was the data Center he bought a new data Center.

178
00:22:26.280 –> 00:22:35.040
Andy Whiteside: Now the old building became the backup and it never should have been the data Center To begin with, but you know, all I can afford is one and, but now, with the world of co lows and cloud that has a better.

179
00:22:36.810 –> 00:22:49.050
Jirah Cox: yeah it’d be my hope, but it is just that I hope that you know yeah right with high quality modern colo is that having to two data centers neither which is inherently more risky than the other is can be an easy reality.

180
00:22:50.460 –> 00:22:56.370
Jirah Cox: So we’ve got our our rules in place that you know certain vm should run on on one side, certainly am should run at the other side.

181
00:22:59.100 –> 00:23:14.820
Jirah Cox: And then yeah we get here to deploy the witness, so we deploy that as a virtual appliance given an IP address, and then we within each each as closer register those clusters, with the witness vm so they both check in their heart beating to make sure that they are online.

182
00:23:15.930 –> 00:23:17.430
Jirah Cox: So, then, I lost it.

183
00:23:17.760 –> 00:23:18.990
Andy Whiteside: How often are they touching it.

184
00:23:19.440 –> 00:23:20.370
Jirah Cox: um I don’t know.

185
00:23:21.480 –> 00:23:23.040
Jirah Cox: Very often, it must be.

186
00:23:23.070 –> 00:23:24.090
Andy Whiteside: Like almost non stop.

187
00:23:27.660 –> 00:23:32.730
Jirah Cox: mean yeah I would assume yeah that’s got to be very, very frequent yeah okay.

188
00:23:33.870 –> 00:23:38.100
Jirah Cox: um you know mail a postcard once a week that should be fine right.

189
00:23:41.040 –> 00:23:47.760
Jirah Cox: So the we can deploy it in your lab if you want, you can then become packet sniffing watch how often those things call each other.

190
00:23:49.890 –> 00:23:51.450
Andy Whiteside: And why you would hard, we can do that and let me know.

191
00:23:53.640 –> 00:23:55.500
Jirah Cox: So you’re not that curious okay interesting.

192
00:23:56.340 –> 00:23:57.240
Andy Whiteside: have other things.

193
00:23:57.420 –> 00:23:59.160
Andy Whiteside: Of course we all do at this point.

194
00:23:59.190 –> 00:24:07.860
Andy Whiteside: But maybe a was the guy’s name had a really cool name guy who wrote the blog drum your own list is the restaurant i’m sure he knows.

195
00:24:08.010 –> 00:24:08.430
Jirah Cox: There you go.

196
00:24:09.810 –> 00:24:24.990
Jirah Cox: So, the last thing to do, then, is a turn an application right so we’re going to replicate and we named our our containers running meaningfully right from one to two and from two to one we’re going to go in and they were application in that direction named in the in the container.

197
00:24:26.340 –> 00:24:35.370
Jirah Cox: And that’s it that’s the, that is the end of the setup process with talk you through it in the span of a single podcast episode So then, what does this look you do.

198
00:24:36.330 –> 00:24:41.430
Jirah Cox: For one right, of course, you can now live migrate across clusters right because the motion within a data store.

199
00:24:41.940 –> 00:24:49.890
Jirah Cox: The much within a cluster is super easy to do with the V Center level so that lets you get your live migration, and this is where where we like we open with.

200
00:24:50.370 –> 00:25:01.710
Jirah Cox: scenarios, where you can do planned workload migration to avoid say plan to data Center maintenance, so that would be easy right and now we’re back to rerouting around traffic on the way to school.

201
00:25:03.660 –> 00:25:14.970
Jirah Cox: So with that I can you know turn off either update my drs rules or turn off ERS project updating the rule is better update my role to save these vm should run at the other site.

202
00:25:16.410 –> 00:25:18.810
Jirah Cox: don’t migrate over, and then I can activate their.

203
00:25:19.830 –> 00:25:31.950
Jirah Cox: Their container right, so I can take the data store named one to an activated at site to to make it readable there so that i’ve moved all my workloads into one basket and now I can.

204
00:25:32.730 –> 00:25:47.730
Jirah Cox: Do maintenance packet data Center one yeah and then I can move them back over so i’ve got my my live migration across data centers they’re pretty easy and all i’m really doing is touching my drs rules there for where that should rule will place my vm.

205
00:25:48.840 –> 00:25:53.700
Jirah Cox: But then the other real outcome that most customers are going to look at this for is that you know.

206
00:25:54.600 –> 00:26:04.470
Jirah Cox: The smoking whole data Center goes dark at three in the morning kind of scenario and my vm Czar and ha event away from recovering at the alternate data Center so that’s sort of a.

207
00:26:04.800 –> 00:26:07.950
Jirah Cox: You know no human touch is a button, but all the workloads come back up.

208
00:26:08.730 –> 00:26:21.210
Jirah Cox: is usually that’s really the the plan you know there’s some migration maintenance is a nice to have but, really, they must have is the no data loss all data written across to another data Center before it.

209
00:26:21.990 –> 00:26:28.380
Jirah Cox: Ever gets acknowledged and then auto workload result resuming across the wire there at the other data Center or the real.

210
00:26:30.090 –> 00:26:31.140
Jirah Cox: Huge deliverables.

211
00:26:31.740 –> 00:26:41.430
Andy Whiteside: So harvey’s probably had more interaction with this and I have most of my world has been around vdi and building, you know non persistent on one side nonprofits, on the other, but have control over your control over there.

212
00:26:41.820 –> 00:26:45.480
Andy Whiteside: And if one went down all the other stuff just kept going I ran a half my load but.

213
00:26:45.780 –> 00:26:52.230
Andy Whiteside: I still had something available now where it really becomes interesting on the vdi world is all those persistent desktops that.

214
00:26:52.530 –> 00:27:07.560
Andy Whiteside: You know they can’t live in both places same time this all of a sudden solve that challenge, and there are lots of rollouts where they have hundreds if not thousands of persistent vdi workloads that need to need to be available have if one day that’s what’s other cluster was gone.

215
00:27:08.730 –> 00:27:09.390
metric plus.

216
00:27:12.120 –> 00:27:22.290
Andy Whiteside: or vm this one’s always been a fascinating to me so i’ve kind of dominate any specific comments or questions or thoughts on metro clusters and what you’ve seen.

217
00:27:23.700 –> 00:27:42.000
Harvey Green: I mean, for me, is what I guess what I would have expected, I mean not not a lot of last week shouldn’t say, not a lot, there are lots of customers who do get this kind of flexibility, because they have those sites that are geographically.

218
00:27:43.860 –> 00:27:51.300
Harvey Green: Geographically available for this something that’s this open enough or fast enough or low enough latency that they can pull this off.

219
00:27:52.710 –> 00:28:11.160
Harvey Green: It doesn’t happen for everybody, so it’s not something that i’d say I run into very often but it’s definitely something that i’ve run into often enough that you know it’s it’s good to know the ins and outs behind it and how you can set it up to test like Jerome did.

220
00:28:12.450 –> 00:28:13.710
Jirah Cox: And some of the key like.

221
00:28:13.920 –> 00:28:24.630
Jirah Cox: What is it not call out like you don’t need specific node types, not a ton of virtual appliances to deploy it’s the one witness right and beyond that you’re just using regular detect clusters to do it so.

222
00:28:24.660 –> 00:28:32.550
Andy Whiteside: Direct that brings up a couple things, let me ask, let me ask the one question just popped in my head I guess you got to have at least the same amount of storage capacity on both sides.

223
00:28:32.910 –> 00:28:39.810
Andy Whiteside: If you’re going to be going from each direction towards each other, you got to have enough storage to cover that replication right.

224
00:28:39.990 –> 00:28:45.030
Jirah Cox: I don’t know if it’s a hard requirement but boy I would you recommend it yeah otherwise like it’s pretty interesting yeah.

225
00:28:45.330 –> 00:28:48.780
Andy Whiteside: And then we talked all through this article this blog about doing it with.

226
00:28:49.140 –> 00:28:50.400
Andy Whiteside: vmware it probably just.

227
00:28:50.400 –> 00:28:54.570
Andy Whiteside: gets us even slightly easier if not a lot easier on HIV or not.

228
00:28:55.080 –> 00:29:09.090
Jirah Cox: Correct yeah because in this one right we’re sort of using the concert and the Center of ha is a cluster property eugenics already knows when it’s managing an HP environment that each cluster you know, has its own hga control plane.

229
00:29:10.170 –> 00:29:18.270
Jirah Cox: So the steps, a little bit different set it up, but yeah the vm recovery automatically across synchro on HP is also possible yeah cool.

230
00:29:19.440 –> 00:29:25.110
Andy Whiteside: Well guys, I think that wraps it up, this is, this is very interesting this This, to me, is a little more interesting than red hat open shift, I have to admit.

231
00:29:27.900 –> 00:29:28.230
Andy Whiteside: i’m.

232
00:29:28.260 –> 00:29:29.190
Jirah Cox: Sorry redhead.

233
00:29:30.420 –> 00:29:34.740
Andy Whiteside: it’s good stuff it’s just not how my you know where my where my passions been.

234
00:29:36.360 –> 00:29:41.220
Andy Whiteside: There was a brief like three months, there were, I tried to become a Linux server guy in the early 2000s.

235
00:29:42.390 –> 00:29:58.710
Andy Whiteside: That in a unix guy and then, then I got the way I didn’t I did not make it I ended up back in the windows in the vdi world I thought was more fun, but I get it, I love getting down deep into Colonel and commanding myself away.

236
00:30:00.000 –> 00:30:04.080
Jirah Cox: And both are important, you could use this to protect your red hat open shift environment for sure sure.

237
00:30:05.550 –> 00:30:10.590
Andy Whiteside: Well guys happy Monday and thanks for joining, and this was fun and we’ll we’ll do it again a week.

238
00:30:11.430 –> 00:30:12.000
Jirah Cox: Every Wednesday.

239
00:30:12.960 –> 00:30:13.440
Harvey Green: See you guys.