63: Nutanix Weekly: Honey I Shrunk My Cluster (Multiple Nodes Down in RF2)

Dec 16, 2022

Within Nutanix we have Replication Factor (How many data copies are written in the cluster) and Redundancy Factor (how many nodes/disks can go offline). Both can have a value of 2 and 3. What is what is explained here: Blog Post.

So, when we have a larger cluster, we always recommend using RF3 (Redundancy Factor 3) as the risk is higher that you have multiple nodes/disks go offline at the same time.

During trainings and onsite customer work I often get the question, “what will happen if multiple nodes go offline in Redundancy Factor 2?” In this blog post I will explain different scenarios and their behaviors.

Blog by: Jeroen Tielen
Host: Andy Whiteside
Co-host: Harvey Green
Co-host: Philip Sellers
Co-host: Jirah Cox
Co-host: Ben Rogers

WEBVTT

1
00:00:02.270 –> 00:00:19.830
Andy Whiteside: Hello, everyone! Welcome to episode. 63 of New tennis week. Name your host, Andy White Side it’s a December twelfth, 2,022 and I’ve got a a big crew of smart guys. Let me get them introduced here real quick, Harvey Green, my co-host from probably day one Were you on day one already I can’t remember. I feel like I was. But

2
00:00:20.050 –> 00:00:24.560
Harvey Green: who knows?

3
00:00:25.180 –> 00:00:30.590
Andy Whiteside: So, Harvey, this is your first December, actually running a business running this entire Gov business.

4
00:00:30.660 –> 00:00:33.099
Andy Whiteside: you know it’s getting ready to get crazy right?

5
00:00:33.140 –> 00:00:35.100
Harvey Green: Yes, 100%.

6
00:00:35.600 –> 00:00:48.029
Harvey Green: It has already done that. I don’t. I don’t mean to scare you, but that that means you’re making money as long as the people pay their bills. Yes, that is correct.

7
00:00:48.120 –> 00:00:54.029
Harvey Green: It’s always the caveat. That is always. It’s kind of the way most businesses run, I think. Right?

8
00:00:54.230 –> 00:01:06.550
Andy Whiteside: Well, yeah, it is. I’ve been this book about about You know a retailer retailer, he retailer from back in the 2,000. You guys may value America. I don’t. I didn’t remember it, and I do now that this is the book.

9
00:01:06.880 –> 00:01:13.319
Andy Whiteside: But it reminds me how many companies out there run with the idea of just creating revenue because they’re trying to sell themselves off as quick as they can.

10
00:01:13.490 –> 00:01:14.789
Andy Whiteside: and

11
00:01:15.300 –> 00:01:18.409
Andy Whiteside: you know. And when you have to pay your own bills, that’s a different world.

12
00:01:20.380 –> 00:01:26.210
Jirah Cox: Yeah, I think you’re either fundamentally in that yeah customers pay the bills model, or I guess you’re running a that collecting agency.

13
00:01:26.250 –> 00:01:27.730
Jirah Cox: Those are kind of your 2 choices.

14
00:01:27.770 –> 00:01:32.209
Andy Whiteside: Well, 100%. That’s the integr. Does a lot of that. We we track down a lot of payments.

15
00:01:32.480 –> 00:01:38.270
Philip Sellers: I think. Ultimately all businesses do, unless you pay up front or whatever. Philip sellers. How’s it going.

16
00:01:38.690 –> 00:01:40.690
Philip Sellers: Good! How are you? Good!

17
00:01:40.740 –> 00:01:42.350
Andy Whiteside: So you are

18
00:01:42.930 –> 00:01:55.779
Andy Whiteside: coming from the customer side the Vmware of Vmware customer. Specifically, the new tanks piece is something. Have you always looked at it like longingly from afar. Or is this, you know, really your first for a into the Newtonics world?

19
00:01:56.420 –> 00:02:09.770
Philip Sellers: so I spent a day with Gyra at the Durham headquarters last month, and he made me a believer. yes, I did mispronounce that. so?

20
00:02:09.850 –> 00:02:25.049
Philip Sellers: no, you know it’s it’s really interesting coming from a Vm. Or background. And and looking at where new Tanks is at today. you know I knew it when it was simplicity versus mechanics, you know. The hyper converged wars back before there was V. Sand.

21
00:02:25.160 –> 00:02:34.150
Philip Sellers: all of that kind of thing. So I followed the company for a long time, and i’m really impressed with the ecosystem and services that they enable. So

22
00:02:34.160 –> 00:02:49.669
Philip Sellers: I would say, what I see about it today is it’s really a platform play, and that’s the investment for a customer is that you’re investing in a platform that’s gonna allow you to deliver your it services that you need. And that’s a really cool.

23
00:02:49.920 –> 00:02:55.920
Philip Sellers: It’s a cool value. Prop: When you start going out and talking to customers about where Newton is at the day.

24
00:02:56.030 –> 00:02:56.750
Yeah.

25
00:02:56.950 –> 00:02:59.750
Andy Whiteside: yeah, All on prem Colo.

26
00:02:59.970 –> 00:03:04.010
Andy Whiteside: in the cloud that matter on your terms where you want it

27
00:03:04.040 –> 00:03:05.420
Andy Whiteside: right at any point.

28
00:03:06.170 –> 00:03:08.890
Jirah Cox: Happy to have you on the

29
00:03:09.180 –> 00:03:14.939
Andy Whiteside: you’ve been around for quite a while as well. You’re still coming back, so must be having a little bit of fun.

30
00:03:15.190 –> 00:03:16.690
Jirah Cox: Yeah, man. No, it’s a blast

31
00:03:16.840 –> 00:03:23.090
Jirah Cox: you guys put up the only you canx flavored partner Podcast that i’m recording today

32
00:03:23.520 –> 00:03:27.110
Philip Sellers: this week only one this month, I hope.

33
00:03:27.180 –> 00:03:28.690
Jirah Cox: Yeah.

34
00:03:28.860 –> 00:03:36.380
Andy Whiteside: we appreciate you having having you on here, you really do validate these conversations and get a lot of really good feedback from it.

35
00:03:37.430 –> 00:03:38.830
Jirah Cox: Yeah, it’s fun to do

36
00:03:39.310 –> 00:03:44.320
Andy Whiteside: Ben Rogers been you’ve been a customer, been a a friend of ours and multiple fronts.

37
00:03:44.350 –> 00:03:47.310
Andy Whiteside: is doing this podcast. What you thought it would be.

38
00:03:47.930 –> 00:04:00.920
Ben Rogers: it’s been very interesting, you know, when I did the Citrus podcast I had 25 years of citrus under my belt. So i’m very confident, and wherever we went this is a little bit different of a ball game, and so there’s been a couple of times I’ve showed up

39
00:04:00.930 –> 00:04:18.250
Ben Rogers: learned on the fly. You know. What I try to do is put it in a customer’s perspective, and either ask questions. I think customers would want to know answers to, but might be afraid to ask or lean Our friend the Ninja gyro here, and try to get the scoop from him. As I got him on the line. Just

40
00:04:18.260 –> 00:04:36.479
Ben Rogers: you being around him is definitely a great resources. So has it been when I thought I don’t I don’t know if I knew what to think about it. But, man, I’m enjoying myself. And I always get every time I publish this podcast I always get positive feedback from mechanics as an organization.

41
00:04:36.490 –> 00:04:48.010
Ben Rogers: and also customers of Newton that are out there that say they learn X, Y. And Z from listening to this. You know that that’s too polite to go like man. I thought there was a lot more preparation as a listener.

42
00:04:48.110 –> 00:04:57.340
Andy Whiteside: Well, he like you, said he did the citrus ones with us. So he he saw the wing in it model that still produces fruit.

43
00:04:58.760 –> 00:05:10.419
Ben Rogers: All right, so our blog let’s digress on that for 1 s for those of you that are listening to this, you know Andy’s philosophy on these podcasts is, you learn the subject as you log in

44
00:05:10.430 –> 00:05:24.090
Ben Rogers: to the podcast. So all of us have about 2 min to digest what we’re gonna talk about it expert format. So for all of you that are listing, just kind of understand we’re when we sit at the table what we’re dealing with on that.

45
00:05:24.650 –> 00:05:33.840
Andy Whiteside: Yes, but like in this case the blog is the title of it’s, honey. I shrunk my cluster multiple nodes down in Rf: 2

46
00:05:33.850 –> 00:05:47.499
Andy Whiteside: by what looks like Jerome Dial in would be the name if I want to try to repronounce that last name I would get. I would guess that’s what I would guess, feeling I don’t know where I got the th there was no th. I made that up

47
00:05:47.850 –> 00:05:50.650
Andy Whiteside: telen that makes total sense. But

48
00:05:50.700 –> 00:05:56.690
Andy Whiteside: gyro, you brought this blog, which thankfully you bring a lot of these prepared to talk about them.

49
00:05:56.850 –> 00:06:01.110
Andy Whiteside: What’s the gist of what happened here, and why he wrote this?

50
00:06:01.770 –> 00:06:08.479
Jirah Cox: Yes, this is. I love this post from Jerome. It’s on our community blog. By the way, where you know anybody can join as a member of the community. And

51
00:06:08.610 –> 00:06:14.100
Jirah Cox: and you know, if you want. If you’ve got something you want to say for sure it’s a great platform for it.

52
00:06:14.330 –> 00:06:30.819
Jirah Cox: Jerome posted this this right up from from conversations he’s had before of, you know. Hey, if i’m running in this case an example of 7 nodes with Rf. 2 right, which is our structure for maintaining a data protection sla of writing all data twice within a cluster.

53
00:06:30.970 –> 00:06:37.350
Jirah Cox: so that cluster. We say that it could lose one anything right, a disk or a node. What happens if it loses 2

54
00:06:37.380 –> 00:06:48.559
Jirah Cox: right? And what happens in that? In that case so great question, Great exploration. Here it’s a real real world customer facing question that we get a lot actually right is like the so we plan for this much failure. But if we get more failure than that.

55
00:06:49.020 –> 00:06:59.150
Andy Whiteside: So when he says in the second paragraph, we always, when we have larger customers, we customers clusters. We always recommend Rf. 3. Is that just because

56
00:06:59.230 –> 00:07:01.689
Andy Whiteside: more is better if you have the space.

57
00:07:02.170 –> 00:07:11.700
Jirah Cox: Yeah, and I’ve seen some of the stats the it comes down to, you know, like anything right if you ran even just like what things just hypervisor. Only

58
00:07:11.770 –> 00:07:17.320
Jirah Cox: if you’re thinking about compute availability, Well, gee! If I run a 10 node cluster or a 100 node cluster

59
00:07:17.380 –> 00:07:20.600
Jirah Cox: at some point I want to size beyond just n one

60
00:07:20.690 –> 00:07:22.879
Jirah Cox: right like I wouldn’t run a 99 node

61
00:07:22.910 –> 00:07:28.219
Jirah Cox: compute cluster of any hypervisor with only n plus one availability, because my odds as a

62
00:07:28.250 –> 00:07:33.889
Jirah Cox: her practitioner, as the admin having more than one availability zone, right, a blast radius, call it

63
00:07:34.000 –> 00:07:44.680
Jirah Cox: a single node, a compute factor. down at any one time. Those odds increase astronomically beyond a certain count threshold. it always depends. But you know, usually

64
00:07:45.310 –> 00:08:03.189
Jirah Cox: let’s. Let me fairly represent a lot of viewpoints here. Most Esses would tell you by by at least time, you’re hitting like node 24 in a cluster. We’re probably a threshold where we want to be designing for what we call R 3 right of all data written 3 times within the cluster versus all data written twice in a cluster.

65
00:08:03.200 –> 00:08:06.969
Jirah Cox: because my odds are having one node fail, and then another node

66
00:08:07.030 –> 00:08:15.830
Jirah Cox: fail after that, or maybe i’m doing maintenance. I have one down and then another. Node chooses that time to purple screen, or whatever it is. increase at that size

67
00:08:16.160 –> 00:08:21.669
Andy Whiteside: that that’s kind of my that’s kind of my wife’s logic. I’ve got 4 cars, most of them older and

68
00:08:22.100 –> 00:08:31.109
Jirah Cox: chances are the more older cars I get, the more chances. One of them is we broken down? Yeah, Fantastic analogy, right? I mean, I always say hardware gonna hardware. So

69
00:08:32.330 –> 00:08:33.860
Andy Whiteside: it’s gonna do what it’s gonna do.

70
00:08:33.970 –> 00:08:35.090
Harvey Green: True story

71
00:08:35.159 –> 00:08:45.099
Andy Whiteside: all right. So he gets a couple of screenshots of his environment. The first one is redundancy, factor, readiness with a category of 2. The options. If I had to drop down, ability here would be

72
00:08:45.160 –> 00:08:47.860
Andy Whiteside: 2 and 3. Are there any other options?

73
00:08:48.650 –> 00:09:04.569
Jirah Cox: Basically, that’s that’s true for like you know 99.9, 9 lot of 9 here, most workloads. It’s going to be Rf: 2 and Rf. 3 for for a very small amount of workloads where you’re doing in app redundancy right, and have availability. I’m thinking of like explan, I think enough to do from thinking of

74
00:09:04.840 –> 00:09:18.059
Jirah Cox: a couple of the workloads where the application is already gonna split that right and store it 2 places. Then, of course, you can do Rf: One understanding that when you’re running Rf: one, if something goes bump, and like a disk fails, or whatever it gets yanked.

75
00:09:18.110 –> 00:09:18.910
Jirah Cox: Then

76
00:09:18.940 –> 00:09:30.790
Jirah Cox: you told us you already have availability of the application little elsewhere. Right? So we don’t rebuild that data. So. but yeah, for the most part, from all for common virtualization use cases, it’s going to be Rf: 2 and R. 3.

77
00:09:31.250 –> 00:09:45.979
Andy Whiteside: I have to tell a little joke of myself for the longest time. I I said. Rf. Stood for right frequency, and then somebody pointed out that right starts with a W. And I was like, yeah, so, Andy, I think there is one thing that’s worth mentioning here.

78
00:09:45.990 –> 00:10:05.710
Ben Rogers: and Philip kind of handed to it when he gave his introduction. This is really what creates the platform of Newtonics our ability to do this Rf. Factor. Not only is it protecting us from a from a data protection standpoint. But this also comes into play with. How quickly will the cluster recover? You know, if we’ve got that data spread peanut butter

79
00:10:05.720 –> 00:10:20.050
Ben Rogers: as across these nodes. If a node fails, another node can take over really quick because it’s got the data on it. Also, we talked a lot about Vdi in in your podcast. And this is also what makes our Bdi the data.

80
00:10:20.060 –> 00:10:36.669
Ben Rogers: even though we’re replicating the data off, we’re still keeping that workload local on the node that the data is there. And again, if that node was the fail, we have the metadata to know where to pick it up. You know where we need to go next to get that. So again going back to Phillips.

81
00:10:36.680 –> 00:10:46.930
Ben Rogers: you know his mention of this is a platform. This is at the heart of this platform, and this really what makes new tanks saying when it comes to things like

82
00:10:47.010 –> 00:10:55.769
Ben Rogers: performance, replication, disaster, recovery. These are all the things we had on this idea of redundancy factor

83
00:10:57.050 –> 00:11:02.110
Jirah Cox: totally. We we live in D by Andy. I was gonna say, you know all 5 of us here right

84
00:11:02.260 –> 00:11:11.590
Jirah Cox: as technologists in the Carolinas. We’re already fighting uphill battle. Right? Thank you for not like, you know, highlighting our bad spelling on top of that right?

85
00:11:12.500 –> 00:11:15.060
Andy Whiteside: Oh, you must know I went to elementary school.

86
00:11:17.040 –> 00:11:19.749
Jirah Cox: so I said, all 5 of us here right I cast a wide net.

87
00:11:21.360 –> 00:11:27.100
Andy Whiteside: So if you want to tell us what this manage Vm: high availability piece means.

88
00:11:28.250 –> 00:11:34.529
Philip Sellers: probably looking at me like you. You’re not steering the screen, you idiot! Yeah, this part

89
00:11:35.520 –> 00:11:37.580
Philip Sellers: Zoom the

90
00:11:39.490 –> 00:11:40.810
Philip Sellers: hey? Go ahead, John.

91
00:11:41.080 –> 00:11:52.559
Jirah Cox: Oh, sure. So so the check box here’s what channel screen there. is that ha reservation? Right. So as a as the compute layer, right as virtualization, managing the the virtual machines, right, you can

92
00:11:52.670 –> 00:12:05.920
Jirah Cox: run the high availability engine and basically 2 models out of the box. You’re getting Ha! And you’re getting best effort right where impacted Vms from like a hardware failure event will get restarted automatically on surviving nodes in the cluster.

93
00:12:06.100 –> 00:12:23.940
Jirah Cox: that is, of course, out of the box You get best effort, and then with this checkbox you can opt into hey? Go ahead and pre reserve memory for me. So I get guaranteed availability of my beams already. Have a pre reserved space. and then to boot back up basically you’re moving from an n plus 0 Vm: memory model in front of one.

94
00:12:24.200 –> 00:12:24.890
Andy Whiteside: Yeah.

95
00:12:25.520 –> 00:12:38.079
Andy Whiteside: Yeah, I I jokingly brought that up to Philip because some of these things have been around for a little while or a while now and then mechanics platform, but the when I meet customers they assume some of this stuff is only available in the Vmware side of

96
00:12:38.130 –> 00:12:39.270
the solution.

97
00:12:39.870 –> 00:12:55.649
Philip Sellers: Yeah. And I mean this. This is essentially the same as the the Vmware. So you got a parity here. So you know. Ha! You You put in the number of hosts to fail, you know toleration level and in Vmware, and it reserves that space

98
00:12:55.710 –> 00:13:01.810
Philip Sellers: very similarly, although it’s a slightly different take, I guess, here from the new tenx platform side.

99
00:13:02.180 –> 00:13:16.569
Andy Whiteside: Well, and and Phil, if that’s something I would ask you, coming over from a pure Vmware world, where now you do both? Have you been surprised at how many these features that are available on the Acropolis side of things that maybe weren’t available when you looked at it a while back, if ever.

100
00:13:16.980 –> 00:13:20.249
Philip Sellers: Oh, yeah, yeah, I mean it. It’s

101
00:13:20.760 –> 00:13:26.009
Philip Sellers: It’s pretty incredible what’s been done out on the platform, and

102
00:13:26.300 –> 00:13:30.919
Philip Sellers: enabled on a HP it. It’s, you know it’s

103
00:13:32.180 –> 00:13:36.300
Philip Sellers: it’s very comparable coming from a DM. Or background.

104
00:13:37.690 –> 00:13:46.580
Jirah Cox: So it’s. I mean. I usually say it’s different, but it’s it. It’s it’s different. There’s not a check box for every check. Box. You used to see it on a platform, but it’s got everything you need.

105
00:13:48.240 –> 00:13:51.849
Andy Whiteside: So, Jarra, i’m gonna walk through this and kind of let you

106
00:13:52.240 –> 00:13:57.940
Andy Whiteside: go through what the customer had set up and give me the insight, and then i’ll let the guys just interrupt us and

107
00:13:58.200 –> 00:14:02.939
Andy Whiteside: comment as needed. You want to hit here where he’s talking about what the workload is.

108
00:14:04.190 –> 00:14:11.870
Jirah Cox: Yeah. So drone highlights he’s got 30 windows vms. They’re running windows. 11 zoom guy with btpm enabled.

109
00:14:12.160 –> 00:14:14.720
Jirah Cox: And of course, you know those vms are spread across

110
00:14:14.770 –> 00:14:23.979
Jirah Cox: nodes in the cluster right? So so there’s some vms running on every note in the cluster roughly. What would that be? 4 and a half or so? Vms per post, on average?

111
00:14:24.470 –> 00:14:30.839
Jirah Cox: so if nothing’s Maxed out on resources, he highlights these at like 25%, CPU consumption, about 40% memory consumption.

112
00:14:32.470 –> 00:14:34.109
Jirah Cox: So then,

113
00:14:34.710 –> 00:14:37.569
Andy Whiteside: How many vdi it was. Oh, 30.

114
00:14:37.970 –> 00:14:53.140
Jirah Cox: No, Yup, they’re yeah, 30. Yep, 30. Vm: so then he puts on his Chaos monkey hat and decides to go crash a node. So he logs into the hardware management out of band, you know. Tells it. Hey, Power off server immediately. No warning

115
00:14:53.150 –> 00:15:01.790
Jirah Cox: no notification to like the virtualization layer. The storage layer management layer of that right? So immediately it’s like power pulling the power cord. One node goes off.

116
00:15:02.510 –> 00:15:19.359
Jirah Cox: so then, of course, you know no surprise. What you expect. Right? H. A kicks in vms get restarted on surviving nodes in the cluster right? So only the impact of vms, of course, have to do anything right. Other Vm: just keep on running. so the remaining nodes power on the

117
00:15:19.560 –> 00:15:34.199
Jirah Cox: the dashboard right prison. We give a lot of pixels right there on the on the dashboard to showing the administrator cluster, state, and what’s going on right? What operations are we performing? What are we recovering from? Or is everything totally just situation normal. So immediately it shows it goes into a healing state.

118
00:15:34.210 –> 00:15:39.750
Jirah Cox: It it like alerts that hey Vms are getting migrated or restarting in order to get back to a highly available state.

119
00:15:40.210 –> 00:15:56.500
Jirah Cox: and then, of course, storage rebuild also occur right? So when you lose the hardware instance right? The hypervisor instance, you’re gonna lose some slice of all your user Vm. Your your customer provision Vms: and then, of course, our virtual machine as well, right. Our Cvm. Our Controller Vm.

120
00:15:56.510 –> 00:16:00.250
Jirah Cox: So we run one of on our renewable cluster so that Cbm. Goes down as well.

121
00:16:00.810 –> 00:16:07.810
Jirah Cox: It was. It was hosting some portion of the data right roughly. In this case one-seventh of all the data stored in the cluster.

122
00:16:08.020 –> 00:16:16.989
Jirah Cox: And so the other 6 surviving Cbms are going to start that rebuild to pick up the select from that that failed node, and therefore the failed Cvm. As well.

123
00:16:17.820 –> 00:16:23.390
Andy Whiteside: So gyra it didn’t say or did it, whether these were persistent or non- persistent. Vdi

124
00:16:24.670 –> 00:16:28.620
Jirah Cox: Good question it didn’t say, and let me think if that matters

125
00:16:28.680 –> 00:16:30.880
Jirah Cox: it really

126
00:16:31.060 –> 00:16:39.629
Jirah Cox: doesn’t right? Because if it’s not persistent. You’re gonna get back to that pristine image. That was whatever your you know, deep freeze state that gold master is.

127
00:16:39.860 –> 00:16:46.409
Jirah Cox: If it’s persistent, then that’s simply just a ha event power off clean through on another node

128
00:16:46.750 –> 00:17:02.180
Philip Sellers: just may take a little longer to grab all the bits from a non persistent, because there’s likely more of those. So the healing process will maybe take a little longer. But possibly, but to to chase the tangent if you’re using non persistent with us, probably doing

129
00:17:02.360 –> 00:17:05.090
Jirah Cox: well. If you’re doing Pvs right, then whatever network

130
00:17:05.270 –> 00:17:16.910
Jirah Cox: is gonna network. If you’re if you’re doing Mcs right, then you’re doing different thing differences in disks right against a goldmaster snapshot, and we actually under the covers. We’ll do what we call shadow clone.

131
00:17:16.980 –> 00:17:24.189
Jirah Cox: where, whenever we detect that there’s one V disk in the cluster. That’s doing extra duty right? It’s one V disk powering multiple vms.

132
00:17:24.250 –> 00:17:26.530
Jirah Cox: We actually will cache that the

133
00:17:26.609 –> 00:17:44.220
Jirah Cox: locally on every node in the cluster, so that all those reads go back to local flash. We’ll actually intercept that read operation from the hypervisor down to the V disk, and we dart it locally, even if the authority of copy was elsewhere in the cluster. So we’re shortening that read path You really won’t feel that I don’t think

134
00:17:44.840 –> 00:17:45.610
Philip Sellers: so.

135
00:17:45.900 –> 00:17:55.769
Andy Whiteside: So, Harvey. I’m not. I’m not right to say that if you were doing non persistence, and you were using Mcs or Dsa Pvs. This checkbox where you relate with the hardware reservation

136
00:17:55.830 –> 00:17:59.940
Andy Whiteside: this this probably doesn’t happen because you got to pull the machines to cover this.

137
00:17:59.990 –> 00:18:11.940
Harvey Green: Yeah, you you don’t have to have that extra reservation because you do have a pool, and it doesn’t matter if you lose them if you just restart more of them. The the user might

138
00:18:12.230 –> 00:18:15.100
Harvey Green: lose their session.

139
00:18:15.480 –> 00:18:19.760
Harvey Green: for you know, for a minute. But then they’re able to just restart another one.

140
00:18:20.040 –> 00:18:21.430
Jirah Cox: Yeah, there’s

141
00:18:21.510 –> 00:18:29.729
Jirah Cox: you’re saying a model where you’re doing like one to many right? You’re doing lots of users per per vm. Instance not a one to one.

142
00:18:30.110 –> 00:18:35.370
Andy Whiteside: It could be one to one of machines that are always running. So let’s say he had 30.

143
00:18:35.390 –> 00:18:42.680
Andy Whiteside: And truth is, he had 22 users or maybe had 30 users. He would probably have 35 to 40 up and running so he would have probably lost.

144
00:18:42.820 –> 00:18:46.390
Andy Whiteside: You know, some some of the users would have come back in and they would have had more machines.

145
00:18:46.430 –> 00:18:57.820
Andy Whiteside: the the the the hypervis, not the hypervisor, but the control plane would have recognize. Hey, i’m. I’m supposed to have 10 machines running and waiting. Something happened. I’m down to 9 or 5, or whatever I need to turn a bunch on, or whatever still up.

146
00:18:58.250 –> 00:18:58.940
Jirah Cox: Hmm.

147
00:18:59.930 –> 00:19:06.110
Andy Whiteside: So chances are good. Consultant would never check this box unless it was persistent. But who knows? I don’t know what the situation is.

148
00:19:08.080 –> 00:19:13.079
Andy Whiteside: anyway? all right. So the rebuild and then, Gyro, maybe i’m here

149
00:19:13.780 –> 00:19:19.240
Andy Whiteside: am I here where it says, data, resiliency, status or no. Yeah. So so the rebuild

150
00:19:19.360 –> 00:19:27.380
Jirah Cox: it continues. And it shows so immediately. It shows, you know, my my cluster, which is built to lose one of anything at once.

151
00:19:27.430 –> 00:19:46.580
Jirah Cox: whether it’s a disk or a note, or whatever. So on the dashboard right? They’re going to show fault. Tolerance is 0, right? We’ve got a little red flag next to it. you can click on it right, and even give you the detailed view of what’s currently rebuilding. And you know the the Newtonics cluster right? And really what the Cdm is doing for you all day every day.

152
00:19:46.590 –> 00:20:00.499
Jirah Cox: It’s not one giant monolithic application right? Like running Cdm. Or running aos. It’s got a whole bunch of micro services within the cluster that all work together right? That i’ll create the the the cluster, or the ring topology

153
00:20:00.620 –> 00:20:13.369
Jirah Cox: each one can like self-elect the different leader, leaders and follower states so it’ll show you what part in this screenshot. Here it’s showing the Cassandra ring itself. The metadata partition right? Which is where we store our data about customer data

154
00:20:13.380 –> 00:20:26.859
Andy Whiteside: in the cluster. That’s the layer that’s rebuilding in this case, to back to resiliency, back to health. To where it can lose another one. Anything right? A member in that ring. So let’s get back to that. Rf: that Rf: 2 stage where it’s statutory covered.

155
00:20:27.160 –> 00:20:37.099
Jirah Cox: You can think of it as getting back to Rf. To across a number of measures right? Rf: 2 for user data, Rf: 2 for detects data about the cluster itself

156
00:20:38.100 –> 00:20:39.010
Andy Whiteside: and

157
00:20:39.260 –> 00:20:42.299
Andy Whiteside: Terra. What happens if it does? If there’s not enough available

158
00:20:42.650 –> 00:20:44.190
Andy Whiteside: space.

159
00:20:44.220 –> 00:20:46.469
Andy Whiteside: or I guess they would have told you that before this happened

160
00:20:47.320 –> 00:20:56.609
Jirah Cox: so I mean, that’s part of the planning right, is we never want any customer to be in a place where they couldn’t lose their defined availability

161
00:20:56.820 –> 00:21:01.940
Jirah Cox: threshold, right like in our 2 cluster. Right you can one anything

162
00:21:02.050 –> 00:21:12.199
Jirah Cox: which includes a node? We don’t want you to be within a nodes worth of filling up the cluster right if you do. And then, yeah, totally, the question is going to fill up You’re gonna have a bad day.

163
00:21:12.210 –> 00:21:24.749
Jirah Cox: what it really does when you, if you really just fill it up in that you run a space, it’s going to go into a read-only state to protect itself and product your data to say i’m gonna not disallow any new rights because we’re just plum full,

164
00:21:24.940 –> 00:21:29.099
Jirah Cox: you know. Call support. We need to get this fixed or call, you know.

165
00:21:29.350 –> 00:21:35.680
Jirah Cox: Call ban call Harvey right call, Phil. We need to get you more space in here right so you know bigger disks other node, whatever it takes to

166
00:21:35.960 –> 00:21:44.599
Jirah Cox: to establish health there in the cluster. Often it can be like delete some old snapshots if you want it to. But yeah, if you run out of space forever, it’s just gonna it’s gonna go into a read-only state

167
00:21:45.690 –> 00:21:51.619
Andy Whiteside: Let me pause here, Ben Harvey, Philip, any questions, comments, takeaways.

168
00:21:53.790 –> 00:21:57.559
Harvey Green: no, I mean, I I think, that this is

169
00:21:58.040 –> 00:22:06.739
Harvey Green: definitely describing where you’ll be, and you just call Phil, and he’ll show up with all kinds of drives in his back pocket. He can just switch them out for you.

170
00:22:06.790 –> 00:22:12.700
Jirah Cox: You just you just you just run down to the best buy, and you know, suck some drives.

171
00:22:16.000 –> 00:22:33.739
Ben Rogers: Well, I mean one of the things we have to point out is before you got even into that state. The cluster would be screen the bloody murder. It would be going. Hey? I can’t. I can’t run our F 2. I’m not able to get the I’m not able to get to this compliance state need to have more memory. We have to capacity planning in the cluster. So

172
00:22:33.750 –> 00:22:47.499
Ben Rogers: even though I know we’re kind of in the lab, and we’re doing this. We won’t see the results. I don’t want any of our customers think that. Oh, there’s no warning that this is coming down these boxes that these things get where they’re in a UN unhealthy state.

173
00:22:47.870 –> 00:22:56.920
Jirah Cox: Totally right. But yeah, the cluster itself. We’ve we’ve taught it lots of tricks over the years. It’ll email you. It can send out Snmp alerts. It can open up service now. Tickets, if you allow that.

174
00:22:57.000 –> 00:23:04.819
Jirah Cox: lots of lots of fun tricks. It’s got to alert you, hey? We are entering a you know. Non resilient threshold right of the cluster state here.

175
00:23:04.950 –> 00:23:15.170
Jirah Cox: if you like, if you allow the phone home telemetry right? We call it pulse, right? So we can phone home with how healthy the cluster you’re running it can ping your account team right? They’ll reach out

176
00:23:15.300 –> 00:23:26.230
Jirah Cox: it. Can. there’s a is a new trick that clusters learned last year actually where you can say like in this case, let’s say you’ve got a 7 node cluster. So therefore you shouldn’t write more than 6 nodes worth of data.

177
00:23:26.630 –> 00:23:30.530
Jirah Cox: You can basically tell it only show me 6 tones worth of data.

178
00:23:30.590 –> 00:23:43.749
Jirah Cox: and don’t even pretend that the seventh node exists right. That is, a 100% is the 6 node line. so you can have it. Redraw all the graphs to say this is what full looks like. This is what if we’re healthy, is somewhere to the left of that

179
00:23:43.780 –> 00:23:48.260
Jirah Cox: Don’t don’t make me track. Am I over or under my 6 node threshold

180
00:23:48.860 –> 00:23:49.600
right

181
00:23:50.500 –> 00:24:01.430
Andy Whiteside: so jar with this next sections is after 30 min. Cbm. Not reachable for 30 min. The node is being detached. In other words, it says, hey, i’m going to completely remove this guy from my stored.

182
00:24:01.870 –> 00:24:04.790
Jirah Cox: Yep. So this is what I really like a lot. So we’ve already

183
00:24:04.910 –> 00:24:09.249
Jirah Cox: begun the process. We’ve already at this point probably get close to completing the process of

184
00:24:09.270 –> 00:24:11.769
Jirah Cox: re-healing from the failure State

185
00:24:11.890 –> 00:24:16.290
Jirah Cox: what I love about mutants and the cluster design in general is it’s self healing

186
00:24:16.320 –> 00:24:24.890
Jirah Cox: So we don’t stay in a broken 7 node state for very long we transition to a healthy 6 node state. 6 nodes becomes the new normal.

187
00:24:24.960 –> 00:24:36.420
Jirah Cox: That’s all the nodes we have in the cluster, and all the nodes we know about right. So we don’t we don’t stay in sort of like a degraded state. We eject the seventh note that’s failed from the metadata ring right from our knowledge of what nodes exist in the cluster.

188
00:24:36.510 –> 00:24:43.130
Jirah Cox: for a lot of reasons, right one like we forget about it. There’s some things we can clean up we’re not waiting for him to come back online.

189
00:24:43.760 –> 00:24:46.550
Jirah Cox: another one, right? Let’s say that note is down for a week

190
00:24:46.720 –> 00:24:48.210
Jirah Cox: in a week. When it comes up

191
00:24:48.330 –> 00:24:50.090
Jirah Cox: it has almost no useful data.

192
00:24:50.190 –> 00:25:04.420
Jirah Cox: so we don’t want to treat it like it’s You know a a prodigal node returns home right? We’ll just read it like it’s a new node and the rethink data over to it rather than by differences like. Oh, what’s new? What’s what’s changed? What’s not changed.

193
00:25:04.430 –> 00:25:16.889
Jirah Cox: We’ll just cut bait, and then, if it comes back great, we will. We will accept it back in the cluster as a fresh node, taking that node 7 spot versus having to worry about differences. What data has changed, or what hasn’t

194
00:25:17.620 –> 00:25:21.220
Andy Whiteside: we always we always heal, we always heal down to a healthy state. We don’t

195
00:25:21.250 –> 00:25:23.860
Jirah Cox: stay in a degraded state whenever we can help it.

196
00:25:24.050 –> 00:25:27.409
Andy Whiteside: Yeah, that that allows you to sleep at night, not guessing what

197
00:25:27.630 –> 00:25:46.130
Ben Rogers: could be happening when not there to watch it. Guys any comments on that. Well, for for me personally, this would definitely give me peace of mind, because I’ve left the office several times. Where one drive that you know you had a R. 5 and one drive dropped out while you’re getting the drive. Shit back to you

198
00:25:46.140 –> 00:25:54.690
Ben Rogers: those couple of hours. You’re just praying to the it. Guides that don’t let anything go wrong. I mean so

199
00:25:54.700 –> 00:26:19.929
Ben Rogers: for me to know that my technology would self heal itself and assume that. Oh, the units bad. We’re gonna go ahead and get it out. The mix. We’re gonna continue running a healthy state, and you want to bring that unit back in great. But we’re going to treat that as a new unit. That’s awesome, man. It definitely gives a good level of comfort that you don’t have to, you know. Sit on pins and needles. Why, things are being shipped to you, or procured, or any of those things that we’re all used to dealing with.

200
00:26:20.820 –> 00:26:21.570
No.

201
00:26:21.940 –> 00:26:30.079
Andy Whiteside: So, Jerome, take a step further, I believe, and now he’s ready to at least know that he can take down another node and still be up and going.

202
00:26:30.410 –> 00:26:47.650
Jirah Cox: Yep. So the process here actually remains the same. So you could actually keep on crashing a node right as long as you’re gonna. You can crash one at a time. As long as you are allowing for that re heal to complete, and between each node failure, so you could go from 7 to 6, 6 to 5, 5 to 4

203
00:26:48.660 –> 00:27:02.540
Jirah Cox: for down to 3, and he actually talks about, you know what if you keep on going until you only have 2 nodes left, 2 nodes right at that point. we do under the covers right? We have a mandate to ring. That is, 3 nodes at a minimum size for Rf: 2.

204
00:27:02.550 –> 00:27:21.860
Jirah Cox: That’s about as that’s about as failed as you can get is you could have a 3 node cluster that. Then shucks! One more node, and is down to running on 2 nodes out of 3, 1, 2 legs out of 3 on the stool. Are there? that one won’t re heal because we you can’t shrink down to a 2 node, healthy cluster from a larger, like 7 node cluster

205
00:27:21.870 –> 00:27:32.430
Jirah Cox: 3. No. Is the minimum there. So once you hit that free node threshold and you lose one more node you can run. You can definitely hit Rf: 2 for customer data, right? All your data written twice. In that case it’s

206
00:27:32.450 –> 00:27:37.790
Jirah Cox: fully mirrored right. Everything’s on node one and node 2, and Node 3 is down out of. Think

207
00:27:38.380 –> 00:27:58.159
Jirah Cox: at that point. If you lost one more node well, then, totally. You have no cluster left, right you down to one node out of 7. That’s a a non survival situation for the data. Your data is safe, but it’s not going to run, not going to be operable, not going to be an online state for the cluster. and at that point you do have to go, get you some spare parts and bring at least one more note the tech online from your

208
00:27:58.170 –> 00:27:59.770
Jirah Cox: one node out of 7 State

209
00:28:00.660 –> 00:28:03.659
Andy Whiteside: And tyra. Was this all doable

210
00:28:03.880 –> 00:28:06.599
Andy Whiteside: because of Yes, the magic of Newtonics, plus

211
00:28:06.810 –> 00:28:10.190
Andy Whiteside: the fact that he was running at such a low capacity to begin with.

212
00:28:10.820 –> 00:28:16.049
Jirah Cox: Totally so that’s that’s the real limit that most customers will hit first right unless you are only using what

213
00:28:16.260 –> 00:28:19.869
Jirah Cox: simple math would tell us. One seventh of your storage capacity.

214
00:28:20.010 –> 00:28:29.290
Jirah Cox: You’ll hit that first of let’s say you’re You’re using 3 nodes worth of storage. Then, as soon as you have to rebuild onto fewer than 3 nodes, the data doesn’t fit.

215
00:28:29.400 –> 00:28:31.890
Jirah Cox: We’re gonna call that cluster full.

216
00:28:32.190 –> 00:28:38.879
Jirah Cox: it’s going to go into a real read-only state. And at that point you’ve got to you know, Certainly. Lay hands on the hardware that’s failed and bring that back online.

217
00:28:39.020 –> 00:28:39.720
Yeah.

218
00:28:39.760 –> 00:28:40.470
okay.

219
00:28:41.200 –> 00:28:59.130
Jirah Cox: But yeah, you can. You can fail nose down to the I call it like the water level of the cluster. Right? If the cluster is a certain bucket, you can lose the top of the bucket and another slice of the bucket. You can keep on losing that until, however full it is and once you get that threshold, then then that’s gonna fill it up and we’ll go read only on that date.

220
00:29:00.370 –> 00:29:18.619
Andy Whiteside: and then the bringing this back online. It’s just a matter of adding a node, adding a node, adding a node as they become healthy, and you want to reintroduce them. They come in kind of come in as a foreign object. It looks like, do they? Totally. Yup: yeah. So there’s there’s one click on a prison there to say, yeah, like, Admit this node back in the cluster.

221
00:29:18.630 –> 00:29:29.290
Jirah Cox: If you’re unsure about it right like we. We sort of have a bit of a as a software layer. We have a little bit of a I think, a a wise and somewhat healthy mistrust of hardware health.

222
00:29:29.390 –> 00:29:36.329
Jirah Cox: So if that note’s been flapping, it’s been up and down it’s, it’s caused up enough enough enough heartburn that we ejected it from the ring.

223
00:29:36.730 –> 00:29:55.410
Jirah Cox: We’re gonna make you tell us. Trust that note again rather than do that sort of fully, proactively, right? Maybe you’re testing out some, you know. Flaky didn’t right, or got some weird power going on in one in the cabinet in the data center. we’ll let you tell us when that storm is passed, and then we’ll we’ll. We’ll admit that node back into the ring.

224
00:29:57.090 –> 00:30:01.619
Jirah Cox: So there’s some things that we think of. Some things we can’t programmatically determine as software only

225
00:30:02.160 –> 00:30:11.160
Andy Whiteside: mit ctl. And so to a large degree, it’s aware of that node and aware that it might come online. But it’s completely mitigated it for now until you tell it, hey? I’m ready for you to reconsider 150

226
00:30:11.940 –> 00:30:13.299
Andy Whiteside: bringing this guy back in.

227
00:30:13.940 –> 00:30:15.419
Jirah Cox: Yeah, that’s that’s fair.

228
00:30:15.830 –> 00:30:16.500
Okay.

229
00:30:17.320 –> 00:30:19.360
Andy Whiteside: Philip, I’ll go to you first.

230
00:30:19.590 –> 00:30:23.740
Andy Whiteside: Any additional questions, comments, thoughts, things you like to add.

231
00:30:23.810 –> 00:30:29.270
Philip Sellers: Yeah, I wanted to ask a little bit about exposure time. So you know, we’ve got this 30 min

232
00:30:29.400 –> 00:30:36.170
Philip Sellers: time out with the Cvm. Where it gets ejected out of the metadata ring. So

233
00:30:36.690 –> 00:30:44.140
Philip Sellers: you. You’re really kind of sitting in an exposure, time, or or exposed state for that 30 min or

234
00:30:44.380 –> 00:30:49.550
Philip Sellers: 30 min, plus, however long it takes to continue to the rebuild. Right?

235
00:30:49.630 –> 00:30:52.400
Jirah Cox: It’s a fantastic question. Actually believe it or not. You don’t

236
00:30:52.480 –> 00:30:57.360
Jirah Cox: so let’s as soon as you fail the node at the very next second

237
00:30:57.470 –> 00:31:00.679
Jirah Cox: we’re actually starting the customer data rebuild immediately.

238
00:31:00.920 –> 00:31:08.479
Jirah Cox: and every new right as well. So if the Vm. Generates a new data onto like, let’s say, at the beginning of the 6 surviving nodes in the cluster.

239
00:31:08.550 –> 00:31:16.470
Jirah Cox: We’re immediately honoring that right onto 2 different nodes. Right? Maybe it was initially going to be targeted to node one and node 7 for the 2 replica copies.

240
00:31:16.510 –> 00:31:28.649
Jirah Cox: But now it’ll be node, one and node 6 or node, one and node 5, or whatever it is, so we’d never accept to write as a as the platform. We never. It’s at the right from the Vm. That we can’t honor according to the replica factor, right? Rf: 2 or Rf. 3.

241
00:31:29.080 –> 00:31:39.450
Jirah Cox: So every data is immediately corrected, predicted that way for new data, and we immediately get the rebuild of customer data as well again immediately. So actually what it is it’s like. Let’s just say hypothetical for this credit scenario.

242
00:31:39.490 –> 00:31:50.460
Jirah Cox: New rights are immediately protected according to the replica factor, and let’s say the rebuild finishes in 15 min. The other 15 min is simply our confidence in the note itself

243
00:31:50.480 –> 00:31:58.840
Jirah Cox: before we eject it from the minute or ring. but that’s not really user-facing it Doesn’t expose, risk risk or or cause exposure to your point, fellow. Great question. Fantastic question.

244
00:31:58.870 –> 00:32:17.419
Philip Sellers: And that makes a ton of sense, too, because as long as the rebuild is in place, then all your bits are protected. Yeah. Yeah. So. So there’s parts of this that are totally inside baseball under the covers under the hood. I should say that we’re being super transparent about what happens, and this is all in the Mechanics Bible as well. Right?

245
00:32:17.430 –> 00:32:30.989
Jirah Cox: in terms of like what? What layers of data power the system and and contribute to availability. but yeah, it’s always been an operating. Thesis is, you know, we have to be absolutely paranoid about about

246
00:32:31.000 –> 00:32:43.470
Jirah Cox: customer data integrity, right? Otherwise we’re just useless as a platform, right? No one should trust us. So that’s always been Job. One is be a good steward of customer data, and that includes immediate rebuild with no delay timers until we start the rebuild. We never.

247
00:32:43.520 –> 00:33:00.409
Jirah Cox: We almost never Our view of the world is we never assume the hardware is gonna come back right, so we’ll never delay a rebuild, hoping it will, trying to make our life easier. We’ll take the harder path of sort of the rebuild now. And worst case, if that no does come back because it was just a transient power failure or a reboot.

248
00:33:00.480 –> 00:33:11.259
Jirah Cox: Well, then, worst case we’ve over, predicted some data, and we can do that garbage cleanup, right? But we’ll never. We’ll never let the customer run exposed hoping and praying that hardware comes back when it might not.

249
00:33:12.140 –> 00:33:17.899
Philip Sellers: Yeah, the other thing. And just to reiterate what you said. you know you you’ve got

250
00:33:18.370 –> 00:33:33.199
Philip Sellers: this concept of the local cache in every node, and with Vdi it’s one of the things that really makes it a great platform for running. Vdi is having that local copy on every single one of your nodes. So especially the non persistent desktops.

251
00:33:33.370 –> 00:33:35.280
Philip Sellers: They’re right there.

252
00:33:35.510 –> 00:33:44.679
Philip Sellers: you know it’s it’s it’s never going to be faster than that. Right. It’s hard. He says you’re you’re never going to get food faster than in your own kitchen. So

253
00:33:45.110 –> 00:33:49.220
Philip Sellers: it it’s it’s one of those great platform trade-offs, and and

254
00:33:49.310 –> 00:34:03.790
Philip Sellers: you said it earlier. But you know I just wanted to reiterate, re re-emphasize it because it is one of the great use, cases, and why we we like to do. Vdi on new tanks is architecturally. There’s things there that help us.

255
00:34:05.950 –> 00:34:18.300
Andy Whiteside: Yeah, there’s there’s there that we’ve wanted forever. Just took new tanks showing up to make it to where that was handled at a sub under the surface layer, so we could get on with brokering connections and

256
00:34:18.750 –> 00:34:20.830
Andy Whiteside: enabling user? Experience.

257
00:34:22.960 –> 00:34:25.689
Ben Rogers: You know what the best

258
00:34:26.790 –> 00:34:39.430
Ben Rogers: best thing about. Hv: Well, one of it’s the cost under the inclusion of it. But no, what are you highlighting.

259
00:34:39.550 –> 00:34:44.779
Ben Rogers: So you know we’re customers, you know. Let’s forget all about the prolong and all that

260
00:34:44.810 –> 00:35:00.130
Ben Rogers: we’re going into. You know a little bit of a recession. Time is going to get a lot of thought a little tight. A lot of customers are looking. This is a way. Could I get my budget a little skinnier and utilize something that I already own versus trying to reinvent the will with a secondary product.

261
00:35:02.120 –> 00:35:12.999
Andy Whiteside: Yeah, I I totally get it. I totally agree, I mean in the beginning that was there was an argument that was true, but not a no brainer. But as Philip pointed a while ago, this this thing that you’ve

262
00:35:13.190 –> 00:35:24.020
Andy Whiteside: you guys in new tennis have created that is, this cost-effective platform that now has all these additional services that’s been bolted onto it. it really expanded that story. It’s not free

263
00:35:24.100 –> 00:35:29.539
Ben Rogers: now. Nobody gets it for free, but it is included in your Als license

264
00:35:30.130 –> 00:35:30.770
right?

265
00:35:31.290 –> 00:35:34.720
Jirah Cox: Totally. Yeah. Plenty of opportunity there to simplify. I mean

266
00:35:34.760 –> 00:35:35.809
Jirah Cox: I mean

267
00:35:36.140 –> 00:35:47.330
Jirah Cox: zooming out for a second. I mean Kvm is one of the most widely deployed hypervisors in the world right? The real trick we taught it here. in addition to speaking, you know, Cvm. Is for high speed. Local storage

268
00:35:47.370 –> 00:35:59.409
Jirah Cox: is is manageability right like, you know, brought it into into the family. Right? You manage it with prism. There’s really not if you already know how to run new tanks and managing a. V is kind of a non issue, right? You run like any other cluster.

269
00:35:59.490 –> 00:36:02.620
Jirah Cox: so simplicity simplicity included.

270
00:36:03.730 –> 00:36:10.370
Philip Sellers: Yeah, it it’s funny, Andy, You asked about the enabling Ha reservation checkbox. You know

271
00:36:10.560 –> 00:36:30.449
Philip Sellers: there’s a lot here as I’ve explored the new tax platform. That reminds me of the early days of Vmware, where they make clustering easy. You just went in and checked the box, and that you had a cluster. And I remember doing windows. Cluster builds that took 2 weeks to to set up and configure and get all the bugs worked out prior to that.

272
00:36:30.480 –> 00:36:46.710
Philip Sellers: there’s a lot of that that exists in the platform, too. simplicity. They they’re They’re not. Tennis is doing a good job of delivering something complex under the covers in a simplistic way, operating it in a simplistic way.

273
00:36:46.730 –> 00:36:53.059
Philip Sellers: those are things I think, that resonate with technologists when they get to to see

274
00:36:53.210 –> 00:36:56.059
Philip Sellers: what’s being delivered to them.

275
00:36:56.190 –> 00:36:59.699
Philip Sellers: yeah, I I’ve looked at other hypervisors, one from a

276
00:37:00.640 –> 00:37:10.499
Philip Sellers: a large software company that might write operating systems, and you know it’s clunky. It. It takes some

277
00:37:10.680 –> 00:37:19.239
Philip Sellers: extra steps. You know it’s it’s that old school clustering that you know from windows. It’s. You know it’s the same on hyperv, and

278
00:37:19.390 –> 00:37:26.160
Philip Sellers: there there’s something to be said about the simplistic message, and and delivery that mechanics is doing here.

279
00:37:26.860 –> 00:37:37.849
Andy Whiteside: I think one of the that’s Newton’s has. Is it was born with the same hyper converge processes and concepts, and constructs, and from day one

280
00:37:38.240 –> 00:37:43.420
Andy Whiteside: that wasn’t something that had to, you know, find its way into the solution. It was born that way.

281
00:37:43.530 –> 00:37:50.110
Andy Whiteside: and you know sometimes it’s just helpful to be bored at a time when the future is there versus having to adapt to it.

282
00:37:50.860 –> 00:37:53.120
Jirah Cox: Yeah, it’s pretty fair. I mean the

283
00:37:53.360 –> 00:37:57.720
Jirah Cox: the point I like to make it. It was taught to me years ago. It’s like

284
00:37:57.920 –> 00:38:16.470
Jirah Cox: any system of sufficient ability to run like and solve business problems like this complexity in that right? It’s just a matter of it’s a design question of do we ask users to bear that complexity? Or do we buy that complexity? So they get some of that experience right there’s there’s complexity that is abstracting going on under the hood

285
00:38:16.540 –> 00:38:32.449
Jirah Cox: I I heard one time that you know we make. We evaluate any given Vm. Right on like 7 or 12 different metrics to determine where it’s going to land right. So there’s there’s plenty of decisions being made under the hood, but we just don’t ask users to wait into the thick of that, and make those for us right if we can. If we can abstract and we will.

286
00:38:32.870 –> 00:38:33.430
Right.

287
00:38:35.370 –> 00:38:40.269
Andy Whiteside: Well, guys, we’re more or less out of time. This has been a good conversation been

288
00:38:40.520 –> 00:38:41.899
Andy Whiteside: any additional

289
00:38:41.940 –> 00:38:43.319
Andy Whiteside: thoughts, comments?

290
00:38:43.580 –> 00:38:53.000
Ben Rogers: No, I I you know I kind of go back to what they said at the beginning of the podcast man. This is a platform. It it’s not just a hyper converge system anymore.

291
00:38:53.010 –> 00:39:10.749
Ben Rogers: and all the other products that we have may kind of hinge off for this idea of, you know, spreading data across the cluster day, the resiliency, all these things. So you guys want to learn more. Reach out to us. It’s exciting time to be at Newton. We’re taking this technology to cloud. Now.

292
00:39:10.810 –> 00:39:20.549
Ben Rogers: that’s opening up some doors for so really good time to be employed for new tennis. Really good time to be running new technics, and and I look forward to what they what the future brings for us.

293
00:39:22.180 –> 00:39:24.799
Andy Whiteside: Harvey. What what did we miss? You want to cover?

294
00:39:25.070 –> 00:39:26.959
Harvey Green: well, since you

295
00:39:27.030 –> 00:39:40.090
Harvey Green: time stand and date stamp the episode at the start, i’ll just remind everybody that this this week is a on a third Friday week. So we’re doing our workshop on Friday from

296
00:39:40.270 –> 00:39:46.130
Harvey Green: 11 to 2 this week is mechanics database service. So

297
00:39:46.200 –> 00:39:48.009
Harvey Green: definitely jump on.

298
00:39:48.960 –> 00:39:50.640
Andy Whiteside: Formerly no one in America

299
00:39:50.820 –> 00:39:52.730
Harvey Green: formerly known as era.

300
00:39:52.950 –> 00:39:56.609
Jirah Cox: and that’s that’s that’s a phone workshop, right? That’s like

301
00:39:56.750 –> 00:40:01.490
Jirah Cox: that that goes so far beyond this kind of like Vm. Management data rebuilding. It’s like

302
00:40:01.650 –> 00:40:08.479
Jirah Cox: batteries included, right like. What can the platform really do for you and help streamline a lot of day. 2 operations.

303
00:40:08.540 –> 00:40:27.529
Harvey Green: Yeah. I I will laugh at one of my friends because he always tells me I I come back with the phrase, what else can you do every time he he bring something up? this? This is one of those occasions where we just gone through. You know this entire podcast on some of the things that are underpinning the entire platform.

304
00:40:27.630 –> 00:40:33.300
Harvey Green: and then it’s like, Well, what else can you do? Well, on Friday you’ll see more

305
00:40:34.190 –> 00:40:36.209
where that came from.

306
00:40:38.080 –> 00:40:42.290
Philip Sellers: no, I’ll be on the same workshop with

307
00:40:42.590 –> 00:40:50.670
Harvey Green: with Harvey on Friday. We’d love to see you, you know Bill’s gonna do all the work this time. He’s gonna sit there and relax.

308
00:40:51.620 –> 00:40:55.640
Andy Whiteside: I have no doubt it will go just fine gyra. One last chance anything.

309
00:41:01.640 –> 00:41:21.150
Jirah Cox: I shouldn’t be blanking on that. Oh, well, we announced that next come in Chicago right definitely. Talk to take out the site for that talk to your account teams. Tell me you want to get some passes. and we’d love to see it in Chicago as Dot next leaves the virtual world, and goes back to the physical or something like that. The the analogy breaks down

310
00:41:21.470 –> 00:41:33.129
Andy Whiteside: well. And if you’re looking for passes, if you renew with integrity, your existing licenses with their passes in. We’re going to be given away, I think 20 passes passes. We I think we plan to give.

311
00:41:33.690 –> 00:41:42.400
Jirah Cox: We plan to give away both the the broncos that we’re giving away as far as work has no boundaries. at the event. That’s right. There you go, man. I want to be a centigrade customer.

312
00:41:42.510 –> 00:41:55.430
Harvey Green: Hey? You should be. You should be. My fear is, you wouldn’t need us. But that’s okay. All right, guys, we’ll appreciate you guys joining and doing this with us, and we’ll do it again in a week or 2.

313
00:41:56.560 –> 00:41:59.020
Jirah Cox: Sounds good, All right, thanks, Everybody.

Share this post

English

About Us

Partners

Awards

XenTegra Cares

Who We Are

Virtual Workspaces

IT Service Management

Applications

Network Solutions

Endpoint

Monitoring & Analysis

Managed Security

Disaster Recovery as a Service (DRaaS)

Desktop as a Service (DaaS)

Public Cloud

Nutanix Hosted Private Cloud

Infrastructure as a Service

Disaster Recovery as a Service (DRaaS)

Desktop as a Service (DaaS)

Colocation Services

Public Cloud

Hyperscale Cloud

Infrastructure Monitoring

Backup and Recovery

HyperConverged Infrastructure

Nutanix

Server Hardware

Cohesity

Azure Cloud

Amazon Web Services (AWS)

Professional Services

Windows & Microsoft 365 Solutions

Virtual Desktop Infrastructure (VDI)

Monitoring

Endpoints

Omnissa

Citrix

Microsoft

Azure Virtual Desktop (AVD)

AI Consulting Services

ServiceNow

XenTegra Honored as 2024 ServiceNow Reseller Partner of the Year and Finalist for Consulting and Implementation Partner

Salesforce

Persistent Purple Team

Managed Detection & Response (MDR)

Penetration Testing

Cyber Exposure Management (CEM)

Cloud Security

Email Protection

Network Infrastructure

SD-WAN

SASE (Secure Access Service Edge)

Firewalls

Micro-Segmentation

All Support Services

Hybrid Support

Citrix Support

ServiceNow Support

IGEL Support

Fortinet Support

Tanium Support

Nutanix Support

PrinterLogic Support

LogicMonitor Support

Microsoft Support

Public Sector

Small & Medium Businesses

Staffing

Scheduled Engagement Contracts

Project Management As A Service

What We Do

Complimentary Assessments

News

Blogs

Podcasts

Conference Pass Discounts

Discounted Training

Careers

Resources

63: Nutanix Weekly: Honey I Shrunk My Cluster (Multiple Nodes Down in RF2)