Lex Friedman Podcast - Jim Keller

00:00:00,000 --> 00:00:03,020 The following is a conversation with Jim Keller,
00:00:03,020 --> 00:00:05,560 legendary microprocessor engineer
00:00:05,560 --> 00:00:10,160 who has worked at AMD, Apple, Tesla, and now Intel.
00:00:10,160 --> 00:00:13,540 He's known for his work on AMD K7, K8, K12,
00:00:13,540 --> 00:00:18,140 and Zen microarchitectures, Apple A4 and A5 processors,
00:00:18,140 --> 00:00:20,120 and co-author of the specification
00:00:20,120 --> 00:00:23,080 for the x86-64 instruction set
00:00:23,080 --> 00:00:26,160 and HyperTransport Interconnect.
00:00:26,160 --> 00:00:28,480 He's a brilliant first principles engineer
00:00:28,480 --> 00:00:30,080 and out-of-the-box thinker,
00:00:30,080 --> 00:00:33,520 and just an interesting and fun human being to talk to.
00:00:33,520 --> 00:00:36,520 This is the Artificial Intelligence Podcast.
00:00:36,520 --> 00:00:38,880 If you enjoy it, subscribe on YouTube,
00:00:38,880 --> 00:00:40,880 give it five stars on Apple Podcast,
00:00:40,880 --> 00:00:43,520 follow on Spotify, support it on Patreon,
00:00:43,520 --> 00:00:45,640 or simply connect with me on Twitter,
00:00:45,640 --> 00:00:49,580 at Lex Friedman, spelled F-R-I-D-M-A-N.
00:00:49,580 --> 00:00:51,080 I recently started doing ads
00:00:51,080 --> 00:00:52,640 at the end of the introduction.
00:00:52,640 --> 00:00:55,600 I'll do one or two minutes after introducing the episode
00:00:55,600 --> 00:00:57,120 and never any ads in the middle
00:00:57,120 --> 00:00:59,420 that can break the flow of the conversation.
00:00:59,420 --> 00:01:00,800 I hope that works for you
00:01:00,800 --> 00:01:04,080 and doesn't hurt the listening experience.
00:01:04,080 --> 00:01:06,200 This show is presented by Cash App,
00:01:06,200 --> 00:01:08,680 the number one finance app in the App Store.
00:01:08,680 --> 00:01:11,480 I personally use Cash App to send money to friends,
00:01:11,480 --> 00:01:13,240 but you can also use it to buy, sell,
00:01:13,240 --> 00:01:15,640 and deposit Bitcoin in just seconds.
00:01:15,640 --> 00:01:18,520 Cash App also has a new investing feature.
00:01:18,520 --> 00:01:21,460 You can buy fractions of a stock, say $1 worth,
00:01:21,460 --> 00:01:23,580 no matter what the stock price is.
00:01:23,580 --> 00:01:26,520 Broker services are provided by Cash App Investing,
00:01:26,520 --> 00:01:29,800 a subsidiary of Square and member SIPC.
00:01:29,800 --> 00:01:32,120 I'm excited to be working with Cash App
00:01:32,120 --> 00:01:35,520 to support one of my favorite organizations called First,
00:01:35,520 --> 00:01:39,040 best known for their FIRST Robotics and Lego competitions.
00:01:39,040 --> 00:01:42,320 They educate and inspire hundreds of thousands of students
00:01:42,320 --> 00:01:44,200 in over 110 countries
00:01:44,200 --> 00:01:46,800 and have a perfect rating at Charity Navigator,
00:01:46,800 --> 00:01:48,080 which means that donated money
00:01:48,080 --> 00:01:50,840 is used to maximum effectiveness.
00:01:50,840 --> 00:01:53,560 When you get Cash App from the App Store or Google Play
00:01:53,560 --> 00:01:56,360 and use code LEXPODCAST,
00:01:56,360 --> 00:02:00,360 you'll get $10 and Cash App will also donate $10 to FIRST,
00:02:00,360 --> 00:02:02,200 which again is an organization
00:02:02,200 --> 00:02:05,000 that I've personally seen inspire girls and boys
00:02:05,000 --> 00:02:08,140 to dream of engineering a better world.
00:02:08,140 --> 00:02:11,520 And now here's my conversation with Jim Keller.
00:02:12,640 --> 00:02:14,600 What are the differences and similarities
00:02:14,600 --> 00:02:17,280 between the human brain and a computer
00:02:17,280 --> 00:02:19,320 with the microprocessor at its core?
00:02:19,320 --> 00:02:22,340 Let's start with the philosophical question perhaps.
00:02:22,340 --> 00:02:25,480 Well, since people don't actually understand
00:02:25,480 --> 00:02:29,280 how human brains work, I think that's true.
00:02:29,280 --> 00:02:30,640 I think that's true.
00:02:30,640 --> 00:02:32,680 So it's hard to compare them.
00:02:32,680 --> 00:02:37,340 Computers are, you know, there's really two things.
00:02:37,340 --> 00:02:40,560 There's memory and there's computation, right?
00:02:40,560 --> 00:02:44,000 And to date, almost all computer architectures
00:02:44,000 --> 00:02:47,680 are global memory, which is a thing, right?
00:02:47,680 --> 00:02:49,440 And then computation where you pull data
00:02:49,440 --> 00:02:52,480 and you do relatively simple operations on it
00:02:52,480 --> 00:02:53,960 and write data back.
00:02:53,960 --> 00:02:57,800 So it's decoupled in modern computers.
00:02:57,800 --> 00:02:59,880 And you think in the human brain,
00:02:59,880 --> 00:03:02,640 everything's a mesh, a mess that's combined together?
00:03:02,640 --> 00:03:04,880 What people observe is there's, you know,
00:03:04,880 --> 00:03:06,520 some number of layers of neurons
00:03:06,520 --> 00:03:09,160 which have local and global connections
00:03:09,160 --> 00:03:12,880 and information is stored in some distributed fashion
00:03:13,720 --> 00:03:18,320 and people build things called neural networks in computers
00:03:18,320 --> 00:03:21,240 where the information is distributed
00:03:21,240 --> 00:03:22,880 in some kind of fashion.
00:03:22,880 --> 00:03:25,600 You know, there's a mathematics behind it.
00:03:25,600 --> 00:03:29,280 I don't know that the understanding of that is super deep.
00:03:29,280 --> 00:03:31,240 The computations we run on those
00:03:31,240 --> 00:03:33,520 are straightforward computations.
00:03:33,520 --> 00:03:35,600 I don't believe anybody has said
00:03:35,600 --> 00:03:37,960 a neuron does this computation.
00:03:37,960 --> 00:03:42,960 So to date, it's hard to compare them, I would say.
00:03:44,200 --> 00:03:48,880 So let's get into the basics before we zoom back out.
00:03:48,880 --> 00:03:51,100 How do you build a computer from scratch?
00:03:51,100 --> 00:03:52,840 What is a microprocessor?
00:03:52,840 --> 00:03:54,200 What is a microarchitecture?
00:03:54,200 --> 00:03:56,720 What's an instruction set architecture?
00:03:56,720 --> 00:03:59,540 Maybe even as far back as what is a transistor?
00:04:01,120 --> 00:04:05,120 So the special charm of computer engineering
00:04:05,120 --> 00:04:08,480 is there's a relatively good understanding
00:04:08,480 --> 00:04:10,520 of abstraction layers.
00:04:10,520 --> 00:04:12,360 So down at the bottom, you have atoms
00:04:12,360 --> 00:04:14,360 and atoms get put together in materials
00:04:14,360 --> 00:04:17,520 like silicon or dope silicon or metal
00:04:17,520 --> 00:04:19,480 and we build transistors.
00:04:19,480 --> 00:04:23,720 On top of that, we build logic gates, right?
00:04:23,720 --> 00:04:27,440 And then functional units like an adder or a subtractor
00:04:27,440 --> 00:04:28,840 or an instruction parsing unit
00:04:28,840 --> 00:04:32,360 and then we assemble those into processing elements.
00:04:32,360 --> 00:04:37,280 Modern computers are built out of probably 10 to 20
00:04:37,280 --> 00:04:41,000 locally organic processing elements
00:04:41,000 --> 00:04:42,680 or coherent processing elements
00:04:42,680 --> 00:04:46,680 and then that runs computer programs, right?
00:04:46,680 --> 00:04:49,840 So there's abstraction layers and then software,
00:04:49,840 --> 00:04:51,840 there's an instruction set you run
00:04:51,840 --> 00:04:56,500 and then there's assembly language C, C++, Java, JavaScript,
00:04:56,500 --> 00:05:00,040 there's abstraction layers essentially from the atom
00:05:00,040 --> 00:05:02,600 to the data center, right?
00:05:02,600 --> 00:05:07,600 So when you build a computer, first there's a target,
00:05:07,760 --> 00:05:08,600 like what's it for?
00:05:08,600 --> 00:05:10,040 Like how fast does it have to be?
00:05:10,040 --> 00:05:12,280 Which today there's a whole bunch of metrics
00:05:12,280 --> 00:05:13,880 about what that is.
00:05:13,880 --> 00:05:17,080 And then in an organization of 1,000 people
00:05:17,080 --> 00:05:22,080 who build a computer, there's lots of different disciplines
00:05:22,280 --> 00:05:24,160 that you have to operate on.
00:05:24,160 --> 00:05:25,520 Does that make sense?
00:05:25,520 --> 00:05:27,160 And so.
00:05:27,160 --> 00:05:29,280 So there's a bunch of levels of abstraction
00:05:30,800 --> 00:05:35,760 in an organization like Intel and in your own vision,
00:05:35,760 --> 00:05:37,640 there's a lot of brilliance that comes in
00:05:37,640 --> 00:05:39,720 at every one of those layers.
00:05:39,720 --> 00:05:41,720 Some of it is science, some of it is engineering,
00:05:41,720 --> 00:05:45,480 some of it is art, what's the most,
00:05:45,480 --> 00:05:47,800 if you could pick favorites, what's the most important,
00:05:47,800 --> 00:05:51,120 your favorite layer on these layers of abstractions?
00:05:51,120 --> 00:05:53,980 Where does the magic enter this hierarchy?
00:05:55,400 --> 00:05:57,160 I don't really care.
00:05:57,160 --> 00:06:00,760 That's the, you know, I'm somewhat agnostic to that.
00:06:00,760 --> 00:06:05,560 So I would say for relatively long periods of time,
00:06:05,560 --> 00:06:08,080 instruction sets are stable.
00:06:08,080 --> 00:06:12,040 So the x86 instruction set, the ARM instruction set.
00:06:12,040 --> 00:06:13,400 What's an instruction set?
00:06:13,400 --> 00:06:16,160 So it says, how do you encode the basic operations?
00:06:16,160 --> 00:06:20,200 Load, store, multiply, add, subtract, conditional branch.
00:06:20,200 --> 00:06:23,840 You know, there aren't that many interesting instructions.
00:06:23,840 --> 00:06:26,200 Look, if you look at a program and it runs,
00:06:26,200 --> 00:06:29,880 you know, 90% of the execution is on 25 opcodes,
00:06:29,880 --> 00:06:31,720 you know, 25 instructions.
00:06:31,720 --> 00:06:33,960 And those are stable, right?
00:06:33,960 --> 00:06:35,520 What does it mean, stable?
00:06:35,520 --> 00:06:38,160 Intel architecture's been around for 25 years.
00:06:38,160 --> 00:06:39,000 It works.
00:06:39,000 --> 00:06:39,840 It works.
00:06:39,840 --> 00:06:42,560 And that's because the basics, you know,
00:06:42,560 --> 00:06:45,320 are defined a long time ago, right?
00:06:45,320 --> 00:06:48,760 Now, the way an old computer ran
00:06:48,760 --> 00:06:53,000 is you fetched instructions and you executed them in order.
00:06:53,000 --> 00:06:56,160 Do the load, do the add, do the compare.
00:06:57,180 --> 00:06:58,920 The way a modern computer works
00:06:58,920 --> 00:07:03,320 is you fetch large numbers of instructions, say 500,
00:07:03,320 --> 00:07:06,280 and then you find the dependency graph
00:07:06,280 --> 00:07:07,960 between the instructions.
00:07:07,960 --> 00:07:12,320 And then you execute in independent units
00:07:12,320 --> 00:07:14,440 those little micrographs.
00:07:15,320 --> 00:07:17,800 So a modern computer, like people like to say,
00:07:17,800 --> 00:07:20,740 computers should be simple and clean.
00:07:20,740 --> 00:07:22,440 But it turns out the market for simple,
00:07:22,440 --> 00:07:26,280 clean, slow computers is zero, right?
00:07:26,280 --> 00:07:29,600 We don't sell any simple, clean computers.
00:07:29,600 --> 00:07:33,560 No, how you build it can be clean,
00:07:33,560 --> 00:07:36,680 but the computer people want to buy,
00:07:36,680 --> 00:07:40,440 that's, say, in a phone or a data center,
00:07:40,440 --> 00:07:42,680 fetches a large number of instructions,
00:07:42,680 --> 00:07:45,600 computes the dependency graph,
00:07:45,600 --> 00:07:49,160 and then executes it in a way that gets the right answers.
00:07:49,160 --> 00:07:50,880 And optimizes that graph somehow.
00:07:50,880 --> 00:07:53,560 Yeah, they run deeply out of order,
00:07:53,560 --> 00:07:57,580 and then there's semantics around how memory ordering works
00:07:57,580 --> 00:07:58,420 and other things work.
00:07:58,420 --> 00:08:02,000 So the computer sort of has a bunch of bookkeeping tables
00:08:02,000 --> 00:08:05,560 that says what order should these operations finish in
00:08:05,560 --> 00:08:07,840 or appear to finish in?
00:08:07,840 --> 00:08:10,740 But to go fast, you have to fetch a lot of instructions
00:08:10,740 --> 00:08:12,760 and find all the parallelism.
00:08:12,760 --> 00:08:15,520 Now, there's a second kind of computer,
00:08:15,520 --> 00:08:19,680 which we call GPUs today, and I call it the difference.
00:08:19,680 --> 00:08:21,920 There's found parallelism, like you have a program
00:08:21,920 --> 00:08:24,160 with a lot of dependent instructions.
00:08:24,160 --> 00:08:26,160 You fetch a bunch, and then you go figure out
00:08:26,160 --> 00:08:29,440 the dependency graph, and you issue instructions out of order.
00:08:29,440 --> 00:08:33,000 That's because you have one serial narrative to execute,
00:08:33,000 --> 00:08:35,880 which, in fact, can be done out of order.
00:08:35,880 --> 00:08:37,100 Did you call it a narrative?
00:08:37,100 --> 00:08:37,940 Yeah.
00:08:37,940 --> 00:08:38,780 Oh, wow.
00:08:38,780 --> 00:08:40,720 So, yeah, so humans think of serial narrative.
00:08:40,720 --> 00:08:43,000 So read a book, right?
00:08:43,000 --> 00:08:45,800 There's a sentence after sentence after sentence,
00:08:45,800 --> 00:08:46,880 and there's paragraphs.
00:08:46,880 --> 00:08:49,400 Now, you could diagram that.
00:08:49,400 --> 00:08:51,860 Imagine you diagrammed it properly, and you said,
00:08:52,720 --> 00:08:55,660 which sentences could be read in any order,
00:08:55,660 --> 00:08:59,100 any order without changing the meaning, right?
00:09:00,000 --> 00:09:02,560 That's a fascinating question to ask of a book, yeah.
00:09:02,560 --> 00:09:04,440 Yeah, you could do that, right?
00:09:04,440 --> 00:09:06,320 So some paragraphs could be reordered,
00:09:06,320 --> 00:09:08,440 some sentences can be reordered.
00:09:08,440 --> 00:09:13,440 You could say, he is tall and smart and X, right?
00:09:15,680 --> 00:09:18,260 And it doesn't matter the order of tall and smart.
00:09:19,880 --> 00:09:22,960 But if you say the tall man is wearing a red shirt,
00:09:22,960 --> 00:09:27,960 what colors, you can create dependencies, right?
00:09:28,480 --> 00:09:32,040 And so GPUs, on the other hand,
00:09:32,040 --> 00:09:35,360 run simple programs on pixels.
00:09:35,360 --> 00:09:36,920 But you're given a million of them,
00:09:36,920 --> 00:09:40,180 and the first order, the screen you're looking at,
00:09:40,180 --> 00:09:42,220 doesn't care which order you do it in.
00:09:42,220 --> 00:09:44,500 So I call that given parallelism.
00:09:44,500 --> 00:09:48,320 Simple narratives around the large numbers of things,
00:09:48,320 --> 00:09:49,440 where you can just say,
00:09:49,440 --> 00:09:52,360 it's parallel because you told me it was.
00:09:52,360 --> 00:09:57,360 So found parallelism where the narrative is sequential,
00:09:57,720 --> 00:10:01,800 but you discover like little pockets of parallelism versus.
00:10:01,800 --> 00:10:04,000 Turns out large pockets of parallelism.
00:10:04,000 --> 00:10:05,920 Large, so how hard is it to discover?
00:10:05,920 --> 00:10:06,980 Well, how hard is it?
00:10:06,980 --> 00:10:08,840 That's just transistor count, right?
00:10:08,840 --> 00:10:11,160 So once you crack the problem, you say,
00:10:11,160 --> 00:10:13,480 here's how you fetch 10 instructions at a time,
00:10:13,480 --> 00:10:16,400 here's how you calculate the dependencies between them,
00:10:16,400 --> 00:10:18,520 here's how you describe the dependencies,
00:10:18,520 --> 00:10:20,700 here's, you know, these are pieces, right?
00:10:20,700 --> 00:10:25,620 So once you describe the dependencies,
00:10:25,620 --> 00:10:27,980 then it's just a graph, sort of,
00:10:27,980 --> 00:10:31,940 it's an algorithm that finds, what is that?
00:10:31,940 --> 00:10:32,940 I'm sure there's a graph here,
00:10:32,940 --> 00:10:35,900 it's a theoretical answer here that's solvable.
00:10:35,900 --> 00:10:40,780 In general, programs, modern programs,
00:10:40,780 --> 00:10:42,300 that human beings write,
00:10:42,300 --> 00:10:45,300 how much found parallelism is there in them?
00:10:45,300 --> 00:10:47,140 What does 10x mean?
00:10:47,140 --> 00:10:52,140 So if you execute it in order, you would get
00:10:52,220 --> 00:10:53,980 what's called cycles per instruction,
00:10:53,980 --> 00:10:58,260 and it would be about, you know, three instructions,
00:10:58,260 --> 00:11:00,040 three cycles per instruction,
00:11:00,040 --> 00:11:02,820 because of the latency of the operations and stuff.
00:11:02,820 --> 00:11:04,540 And in a modern computer,
00:11:04,540 --> 00:11:08,740 excuse it, but like 0.2, 0.25 cycles per instruction.
00:11:08,740 --> 00:11:11,860 So it's about, we today find 10x.
00:11:11,860 --> 00:11:13,040 And there's two things,
00:11:13,040 --> 00:11:17,380 one is the found parallelism in the narrative, right?
00:11:17,380 --> 00:11:21,420 And the other is the predictability of the narrative, right?
00:11:21,420 --> 00:11:25,560 So certain operations say, do a bunch of calculations,
00:11:25,560 --> 00:11:27,740 and if greater than one, do this,
00:11:27,740 --> 00:11:32,140 else do that, that decision is predicted
00:11:32,140 --> 00:11:36,260 in modern computers to high 90% accuracy.
00:11:36,260 --> 00:11:38,760 So branches happen a lot.
00:11:38,760 --> 00:11:40,460 So imagine you have a decision
00:11:40,460 --> 00:11:41,820 to make every six instructions,
00:11:41,820 --> 00:11:43,780 which is about the average, right?
00:11:43,780 --> 00:11:45,480 But you want to fetch 500 instructions,
00:11:45,480 --> 00:11:48,460 figure out the graph, and execute them all in parallel.
00:11:48,460 --> 00:11:51,620 That means you have, let's say,
00:11:51,620 --> 00:11:55,020 if you fetch 600 instructions, and it's every six,
00:11:55,020 --> 00:11:56,980 you have to fetch, you have to predict
00:11:56,980 --> 00:11:59,420 99 out of 100 branches correctly
00:12:00,300 --> 00:12:02,360 for that window to be effective.
00:12:02,360 --> 00:12:06,900 Okay, so parallelism, you can't parallelize branches.
00:12:06,900 --> 00:12:07,740 Or you can.
00:12:07,740 --> 00:12:09,140 No, you can predict, you can predict.
00:12:09,140 --> 00:12:10,620 What does predict a branch mean?
00:12:10,620 --> 00:12:11,460 Or what does predict a branch mean?
00:12:11,460 --> 00:12:13,600 So imagine you do a computation over and over.
00:12:13,600 --> 00:12:14,960 You're in a loop.
00:12:14,960 --> 00:12:19,460 So while n is greater than one, do.
00:12:19,460 --> 00:12:21,260 And you go through that loop a million times.
00:12:21,260 --> 00:12:22,680 So every time you look at the branch,
00:12:22,680 --> 00:12:25,780 you say, it's probably still greater than one.
00:12:25,780 --> 00:12:27,860 And you're saying you could do that accurately.
00:12:27,860 --> 00:12:28,700 Very accurately.
00:12:28,700 --> 00:12:29,520 Modern computers.
00:12:29,520 --> 00:12:30,540 My mind is blown.
00:12:30,540 --> 00:12:31,500 How the heck do you do that?
00:12:31,500 --> 00:12:32,640 Wait a minute.
00:12:32,640 --> 00:12:33,860 Well, you want to know?
00:12:33,860 --> 00:12:35,540 This is really sad.
00:12:35,540 --> 00:12:38,740 20 years ago, you simply recorded
00:12:38,740 --> 00:12:40,660 which way the branch went last time
00:12:40,660 --> 00:12:42,820 and predicted the same thing.
00:12:42,820 --> 00:12:43,660 Right.
00:12:43,660 --> 00:12:44,480 Okay.
00:12:44,480 --> 00:12:46,180 What's the accuracy of that?
00:12:46,180 --> 00:12:48,140 85%.
00:12:48,140 --> 00:12:51,820 So then somebody said, hey, let's keep a couple of bits
00:12:51,820 --> 00:12:55,020 and have a little counter so when it predicts one way,
00:12:55,020 --> 00:12:56,760 we count up and then pins.
00:12:56,760 --> 00:12:58,100 So say you have a three bit counter.
00:12:58,100 --> 00:13:00,780 So you count up and then you count down.
00:13:00,780 --> 00:13:03,300 And if it's, you can use the top bit as the signed bit.
00:13:03,300 --> 00:13:05,060 So you have a signed two bit number.
00:13:05,060 --> 00:13:07,500 So if it's greater than one, you predict taken.
00:13:07,500 --> 00:13:10,420 And less than one, you predict not taken.
00:13:10,420 --> 00:13:11,460 Right?
00:13:11,460 --> 00:13:14,100 Or less than zero, whatever the thing is.
00:13:14,100 --> 00:13:16,140 And that got us to 92%.
00:13:16,140 --> 00:13:17,300 Oh.
00:13:17,300 --> 00:13:19,540 Okay, no, it gets better.
00:13:19,540 --> 00:13:22,900 This branch depends on how you got there.
00:13:22,900 --> 00:13:25,540 So if you came down the code one way,
00:13:25,540 --> 00:13:28,420 you're talking about Bob and Jane, right?
00:13:28,420 --> 00:13:30,460 And then said, does Bob like Jane?
00:13:30,460 --> 00:13:31,300 It went one way.
00:13:31,300 --> 00:13:32,900 But if you're talking about Bob and Jill,
00:13:32,900 --> 00:13:33,940 does Bob like Jane?
00:13:33,940 --> 00:13:35,540 You go a different way.
00:13:35,540 --> 00:13:36,380 Right?
00:13:36,380 --> 00:13:37,200 So that's called history.
00:13:37,200 --> 00:13:38,900 So you take the history and a counter.
00:13:38,900 --> 00:13:40,040 Mm-hmm.
00:13:40,040 --> 00:13:41,360 That's cool.
00:13:41,360 --> 00:13:43,400 But that's not how anything works today.
00:13:43,400 --> 00:13:46,400 They use something that looks a little like a neural network.
00:13:48,040 --> 00:13:51,240 So modern, you take all the execution flows
00:13:52,240 --> 00:13:56,120 and then you do basically deep pattern recognition
00:13:56,120 --> 00:13:58,520 of how the program is executing.
00:13:59,920 --> 00:14:03,740 And you do that multiple different ways.
00:14:03,740 --> 00:14:06,680 And you have something that chooses what the best result is.
00:14:06,680 --> 00:14:10,440 There's a little supercomputer inside the computer.
00:14:10,440 --> 00:14:11,880 That's trying to predict branching.
00:14:11,880 --> 00:14:14,340 That calculates which way branches go.
00:14:14,340 --> 00:14:15,840 So the effective window
00:14:15,840 --> 00:14:18,360 that it's worth finding graphs in gets bigger.
00:14:19,280 --> 00:14:21,880 Why was that gonna make me sad?
00:14:21,880 --> 00:14:22,920 Because that's amazing.
00:14:22,920 --> 00:14:24,420 It's amazingly complicated.
00:14:24,420 --> 00:14:25,260 Oh, well.
00:14:25,260 --> 00:14:27,080 Well, here's the funny thing.
00:14:27,080 --> 00:14:31,740 So to get to 85% took 1,000 bits.
00:14:31,740 --> 00:14:36,740 To get to 99% takes tens of megabits.
00:14:38,900 --> 00:14:42,740 So this is one of those, to get the result,
00:14:42,740 --> 00:14:47,740 to get from a window of, say, 50 instructions to 500,
00:14:47,800 --> 00:14:49,520 it took three orders of magnitude
00:14:49,520 --> 00:14:51,620 or four orders of magnitude more bits.
00:14:52,720 --> 00:14:55,500 Now, if you get the prediction of a branch wrong,
00:14:55,500 --> 00:14:56,340 what happens then?
00:14:56,340 --> 00:14:57,420 You flush the pipe.
00:14:57,420 --> 00:14:59,580 You flush the pipe, so it's just the performance cost.
00:14:59,580 --> 00:15:00,840 But it gets even better.
00:15:00,840 --> 00:15:01,680 Yeah.
00:15:01,680 --> 00:15:03,900 So we're starting to look at stuff that says,
00:15:03,900 --> 00:15:06,740 so they executed down this path,
00:15:06,740 --> 00:15:09,300 and then you had two ways to go.
00:15:09,300 --> 00:15:11,900 But far away, there's something
00:15:11,900 --> 00:15:14,700 that doesn't matter which path you went.
00:15:14,700 --> 00:15:17,700 So you took the wrong path.
00:15:17,700 --> 00:15:19,340 You executed a bunch of stuff.
00:15:20,620 --> 00:15:21,740 Then you had the mispredicting.
00:15:21,740 --> 00:15:22,580 You backed it up.
00:15:22,580 --> 00:15:25,540 You remembered all the results you already calculated.
00:15:25,540 --> 00:15:27,700 Some of those are just fine.
00:15:27,700 --> 00:15:30,300 Like if you read a book and you misunderstand a paragraph,
00:15:30,300 --> 00:15:32,540 your understanding of the next paragraph
00:15:32,540 --> 00:15:35,780 sometimes is invariant to that understanding.
00:15:35,780 --> 00:15:37,640 Sometimes it depends on it.
00:15:38,580 --> 00:15:43,300 And you can kind of anticipate that invariance.
00:15:43,300 --> 00:15:47,380 Yeah, well, you can keep track of whether the data changed.
00:15:47,380 --> 00:15:49,260 And so when you come back through a piece of code,
00:15:49,260 --> 00:15:51,900 should you calculate it again or do the same thing?
00:15:51,900 --> 00:15:55,660 Okay, how much of this is art and how much of it is science?
00:15:55,660 --> 00:15:59,140 Because it sounds pretty complicated.
00:15:59,140 --> 00:16:00,700 Well, how do you describe a situation?
00:16:00,700 --> 00:16:02,660 So imagine you come to a point in the road
00:16:02,660 --> 00:16:05,180 where you have to make a decision, right?
00:16:05,180 --> 00:16:07,100 And you have a bunch of knowledge about which way to go.
00:16:07,100 --> 00:16:08,940 Maybe you have a map.
00:16:08,940 --> 00:16:11,620 So you want to go the shortest way,
00:16:11,620 --> 00:16:13,220 or do you want to go the fastest way,
00:16:13,220 --> 00:16:14,860 or do you want to take the nicest road?
00:16:14,860 --> 00:16:17,900 So there's some set of data.
00:16:17,900 --> 00:16:19,700 So imagine you're doing something complicated
00:16:19,700 --> 00:16:21,860 like building a computer,
00:16:21,860 --> 00:16:24,380 and there's hundreds of decision points,
00:16:24,380 --> 00:16:27,780 all with hundreds of possible ways to go.
00:16:27,780 --> 00:16:30,940 And the ways you pick interact in a complicated way.
00:16:32,220 --> 00:16:33,460 Right.
00:16:33,460 --> 00:16:35,700 And then you have to pick the right spot.
00:16:35,700 --> 00:16:36,540 Right, so that sounds like...
00:16:36,540 --> 00:16:37,580 So that's art or science, I don't know.
00:16:37,580 --> 00:16:38,940 You avoided the question.
00:16:38,940 --> 00:16:41,380 You just described the Robert Frost problem
00:16:41,380 --> 00:16:42,620 of road less taken.
00:16:43,660 --> 00:16:45,700 Describe the Robert Frost problem?
00:16:45,700 --> 00:16:49,460 That's what we do as computer designers.
00:16:49,460 --> 00:16:50,420 It's all poetry.
00:16:50,420 --> 00:16:51,260 Okay.
00:16:51,260 --> 00:16:52,100 Great.
00:16:52,100 --> 00:16:54,220 Yeah, I don't know how to describe that
00:16:54,220 --> 00:16:56,420 because some people are very good
00:16:56,420 --> 00:16:57,980 at making those intuitive leaps.
00:16:57,980 --> 00:17:00,580 It seems like just combinations of things.
00:17:00,580 --> 00:17:02,220 Some people are less good at it,
00:17:02,220 --> 00:17:06,060 but they're really good at evaluating alternatives, right?
00:17:06,060 --> 00:17:09,300 And everybody has a different way to do it.
00:17:09,300 --> 00:17:11,900 And some people can't make those leaps,
00:17:11,900 --> 00:17:14,340 but they're really good at analyzing it.
00:17:14,340 --> 00:17:16,060 So when you see computers are designed
00:17:16,060 --> 00:17:19,300 by teams of people who have very different skill sets,
00:17:19,300 --> 00:17:24,300 and a good team has lots of different kinds of people.
00:17:24,500 --> 00:17:26,300 I suspect you would describe some of them
00:17:26,300 --> 00:17:29,380 as artistic, but not very many.
00:17:30,460 --> 00:17:32,100 Unfortunately, or fortunately.
00:17:32,100 --> 00:17:33,740 Fortunately.
00:17:33,740 --> 00:17:36,500 Well, you know, computer design's hard.
00:17:36,500 --> 00:17:39,500 It's 99% perspiration.
00:17:40,460 --> 00:17:43,340 And the 1% inspiration is really important.
00:17:44,180 --> 00:17:45,900 But you still need the 99.
00:17:45,900 --> 00:17:47,380 Yeah, you gotta do a lot of work.
00:17:47,380 --> 00:17:50,820 And then there are interesting things to do
00:17:50,820 --> 00:17:52,800 at every level of that stack.
00:17:52,800 --> 00:17:55,760 So at the end of the day,
00:17:55,760 --> 00:17:58,920 if you run the same program multiple times,
00:17:58,920 --> 00:18:01,500 does it always produce the same result?
00:18:01,500 --> 00:18:04,760 Is there some room for fuzziness there?
00:18:04,760 --> 00:18:05,880 That's a math problem.
00:18:06,760 --> 00:18:08,600 So if you run a correct C program,
00:18:08,600 --> 00:18:11,520 the definition is every time you run it,
00:18:11,520 --> 00:18:12,520 you get the same answer.
00:18:12,520 --> 00:18:14,520 Yeah, well, that's a math statement.
00:18:14,520 --> 00:18:17,480 Well, that's a language definitional statement.
00:18:17,480 --> 00:18:22,480 So for years, when we first did 3D acceleration of graphics,
00:18:24,640 --> 00:18:27,320 you could run the same scene multiple times
00:18:27,320 --> 00:18:30,280 and get different answers, right?
00:18:30,280 --> 00:18:32,400 And then some people thought that was okay,
00:18:32,400 --> 00:18:34,600 and some people thought it was a bad idea.
00:18:34,600 --> 00:18:39,280 And then when the HPC world used GPUs for calculations,
00:18:39,280 --> 00:18:42,160 they thought it was a really bad idea, okay?
00:18:42,160 --> 00:18:44,480 Now, in modern AI stuff,
00:18:44,480 --> 00:18:48,200 people are looking at networks
00:18:48,200 --> 00:18:51,120 where the precision of the data is low enough
00:18:51,120 --> 00:18:53,720 that the data is somewhat noisy.
00:18:53,720 --> 00:18:57,360 And the observation is the input data is unbelievably noisy.
00:18:57,360 --> 00:19:00,280 So why should the calculation be not noisy?
00:19:00,280 --> 00:19:02,280 And people have experimented with algorithms
00:19:02,280 --> 00:19:06,000 that say can get faster answers by being noisy.
00:19:06,000 --> 00:19:08,320 Like as a network starts to converge,
00:19:08,320 --> 00:19:09,640 if you look at the computation graph,
00:19:09,640 --> 00:19:12,200 it starts out really wide and then it gets narrower.
00:19:12,200 --> 00:19:14,480 And you can say, is that last little bit that important,
00:19:14,480 --> 00:19:17,720 or should I start the graph on the next rev
00:19:17,720 --> 00:19:21,320 before we whittle it all the way down to the answer, right?
00:19:21,320 --> 00:19:24,080 So you can create algorithms that are noisy.
00:19:24,080 --> 00:19:25,480 Now, if you're developing something
00:19:25,480 --> 00:19:27,480 and every time you run it, you get a different answer,
00:19:27,480 --> 00:19:29,320 it's really annoying.
00:19:29,320 --> 00:19:33,960 And so most people think even today,
00:19:33,960 --> 00:19:36,760 every time you run the program, you get the same answer.
00:19:36,760 --> 00:19:38,400 No, I know, but the question is,
00:19:38,400 --> 00:19:42,440 that's the formal definition of a programming language.
00:19:42,440 --> 00:19:44,560 There is a definition of languages
00:19:44,560 --> 00:19:47,400 that don't get the same answer, but people who use those.
00:19:48,400 --> 00:19:50,800 You always want something because you get a bad answer
00:19:50,800 --> 00:19:53,280 and then you're wondering, is it because
00:19:53,280 --> 00:19:55,400 of something in the algorithm or because of this?
00:19:55,400 --> 00:19:57,180 And so everybody wants a little switch that says,
00:19:57,180 --> 00:20:00,320 no matter what, do it deterministically.
00:20:00,320 --> 00:20:02,440 And it's really weird because almost everything
00:20:02,440 --> 00:20:05,360 going into modern calculations is noisy.
00:20:05,360 --> 00:20:07,680 So why do the answers have to be so clear?
00:20:07,680 --> 00:20:08,520 It's-
00:20:08,520 --> 00:20:09,640 Right, so where do you stand?
00:20:09,640 --> 00:20:12,520 I design computers for people who run programs.
00:20:12,520 --> 00:20:16,920 So if somebody says, I want a deterministic answer,
00:20:16,920 --> 00:20:18,400 like most people want that.
00:20:18,400 --> 00:20:20,200 Can you deliver a deterministic answer,
00:20:20,200 --> 00:20:21,480 I guess is the question.
00:20:21,480 --> 00:20:22,320 Like when you-
00:20:22,320 --> 00:20:24,080 Yeah, hopefully, sure.
00:20:24,080 --> 00:20:24,920 That-
00:20:24,920 --> 00:20:27,320 What people don't realize is you get a deterministic answer
00:20:27,320 --> 00:20:31,120 even though the execution flow is very undeterministic.
00:20:31,120 --> 00:20:33,140 So you run this program 100 times,
00:20:33,140 --> 00:20:36,120 it never runs the same way twice, ever.
00:20:36,120 --> 00:20:38,000 And the answer, it arrives at the same answer.
00:20:38,000 --> 00:20:39,240 But it gets the same answer every time.
00:20:39,240 --> 00:20:42,040 It's just amazing.
00:20:42,040 --> 00:20:47,040 Okay, you've achieved in the eyes of many people,
00:20:49,640 --> 00:20:53,040 legend status as a chip architect.
00:20:53,040 --> 00:20:56,440 What design creation are you most proud of?
00:20:56,440 --> 00:20:59,480 Perhaps because it was challenging,
00:20:59,480 --> 00:21:01,860 because of its impact or because of the set
00:21:01,860 --> 00:21:06,840 of brilliant ideas that were involved in bringing it to life.
00:21:06,840 --> 00:21:10,120 Well, I find that description odd.
00:21:10,120 --> 00:21:12,520 And I have two small children and I promise you,
00:21:14,400 --> 00:21:16,000 they think it's hilarious.
00:21:16,000 --> 00:21:16,840 This question.
00:21:16,840 --> 00:21:17,660 Yeah, so-
00:21:17,660 --> 00:21:18,500 I do it for them.
00:21:18,500 --> 00:21:22,480 So I'm really interested in building computers.
00:21:23,360 --> 00:21:27,680 And I've worked with really, really smart people.
00:21:27,680 --> 00:21:29,240 I'm not unbelievably smart.
00:21:29,240 --> 00:21:32,160 I'm fascinated by how they go together,
00:21:32,160 --> 00:21:37,160 both as a thing to do and as an endeavor that people do.
00:21:38,320 --> 00:21:40,080 How people and computers go together?
00:21:40,080 --> 00:21:43,080 Yeah, like how people think and build a computer.
00:21:44,240 --> 00:21:47,860 And I find sometimes that the best computer architects
00:21:47,860 --> 00:21:49,280 aren't that interested in people
00:21:49,280 --> 00:21:51,840 or the best people managers aren't that good
00:21:51,840 --> 00:21:53,320 at designing computers.
00:21:54,460 --> 00:21:56,920 So the whole stack of human beings is fascinating.
00:21:56,920 --> 00:22:00,000 So the managers, the individual engineers.
00:22:00,000 --> 00:22:03,740 Yeah, I realized after a lot of years of building computers,
00:22:03,740 --> 00:22:05,280 we sort of build them out of transistors,
00:22:05,280 --> 00:22:08,640 logic gates, functional units, computational elements,
00:22:08,640 --> 00:22:10,800 that you could think of people the same way.
00:22:10,800 --> 00:22:12,720 So people are functional units.
00:22:12,720 --> 00:22:14,600 And then you could think of organizational design
00:22:14,600 --> 00:22:16,960 as a computer architecture problem.
00:22:16,960 --> 00:22:19,360 And then it's like, oh, that's super cool
00:22:19,360 --> 00:22:20,760 because the people are all different,
00:22:20,760 --> 00:22:23,760 just like the computational elements are all different.
00:22:23,760 --> 00:22:25,640 And they like to do different things.
00:22:25,640 --> 00:22:29,240 And so I had a lot of fun reframing
00:22:29,240 --> 00:22:31,320 how I think about organizations.
00:22:31,320 --> 00:22:36,020 Just like with computers, we were saying execution paths,
00:22:36,020 --> 00:22:37,380 you can have a lot of different paths
00:22:37,380 --> 00:22:41,660 that end up at the same good destination.
00:22:41,660 --> 00:22:45,840 So what have you learned about the human abstractions
00:22:45,840 --> 00:22:48,900 from individual functional human units
00:22:48,900 --> 00:22:51,920 to the broader organization?
00:22:51,920 --> 00:22:55,080 What does it take to create something special?
00:22:55,080 --> 00:22:58,800 Well, most people don't think simple enough.
00:23:00,360 --> 00:23:01,700 All right, so do you know the difference
00:23:01,700 --> 00:23:04,200 between a recipe and the understanding?
00:23:06,360 --> 00:23:09,220 There's probably a philosophical description of this.
00:23:09,220 --> 00:23:11,520 So imagine you're gonna make a loaf of bread.
00:23:11,520 --> 00:23:14,120 The recipe says, get some flour, add some water,
00:23:14,120 --> 00:23:16,860 add some yeast, mix it up, let it rise,
00:23:16,860 --> 00:23:19,440 put it in a pan, put it in the oven.
00:23:19,440 --> 00:23:21,400 It's a recipe, right?
00:23:21,400 --> 00:23:24,780 Understanding bread, you can understand biology,
00:23:24,780 --> 00:23:29,780 supply chains, grain grinders, yeast, physics,
00:23:32,780 --> 00:23:35,640 thermodynamics, there's so many levels
00:23:35,640 --> 00:23:37,300 of understanding there.
00:23:37,300 --> 00:23:40,280 And then when people build and design things,
00:23:40,280 --> 00:23:45,240 they frequently are executing some stack of recipes, right?
00:23:45,240 --> 00:23:46,980 And the problem with that is the recipes
00:23:46,980 --> 00:23:48,960 all have limited scope.
00:23:48,960 --> 00:23:50,720 Like if you have a really good recipe book
00:23:50,720 --> 00:23:52,360 for making bread, it won't tell you anything
00:23:52,360 --> 00:23:54,880 about how to make an omelet, right?
00:23:54,880 --> 00:23:59,260 But if you have a deep understanding of cooking, right?
00:23:59,260 --> 00:24:04,260 Then bread, omelets, sandwich, there's a different way
00:24:05,720 --> 00:24:07,760 of viewing everything.
00:24:07,760 --> 00:24:12,300 And most people, when you get to be an expert at something,
00:24:13,520 --> 00:24:16,440 you're hoping to achieve deeper understanding,
00:24:16,440 --> 00:24:20,000 not just a large set of recipes to go execute.
00:24:20,000 --> 00:24:22,840 And it's interesting to walk groups of people
00:24:22,840 --> 00:24:26,640 because executing recipes is unbelievably efficient
00:24:27,640 --> 00:24:29,240 if it's what you want to do.
00:24:30,540 --> 00:24:33,540 If it's not what you want to do, you're really stuck.
00:24:34,840 --> 00:24:36,640 And that difference is crucial.
00:24:36,640 --> 00:24:39,520 And everybody has a balance of, let's say,
00:24:39,520 --> 00:24:41,000 deeper understanding of recipes.
00:24:41,000 --> 00:24:43,800 And some people are really good at recognizing
00:24:43,800 --> 00:24:46,440 when the problem is to understand something deeply.
00:24:47,760 --> 00:24:49,080 Does that make sense?
00:24:49,080 --> 00:24:50,600 It totally makes sense.
00:24:50,600 --> 00:24:52,800 Does every stage of development,
00:24:52,800 --> 00:24:55,600 deep understanding on the team needed?
00:24:55,600 --> 00:24:58,660 Well, this goes back to the art versus science question.
00:24:58,660 --> 00:24:59,500 Sure.
00:24:59,500 --> 00:25:01,280 If you constantly unpack everything
00:25:01,280 --> 00:25:04,240 for deeper understanding, you never get anything done.
00:25:04,240 --> 00:25:06,920 And if you don't unpack understanding when you need to,
00:25:06,920 --> 00:25:08,500 you'll do the wrong thing.
00:25:09,520 --> 00:25:12,060 And then at every juncture, like human beings
00:25:12,060 --> 00:25:15,160 are these really weird things because everything you tell
00:25:15,160 --> 00:25:18,360 them has a million possible outputs, right?
00:25:18,360 --> 00:25:21,120 And then they all interact in a hilarious way.
00:25:21,120 --> 00:25:21,960 Yeah, it's very nice.
00:25:21,960 --> 00:25:24,280 And then having some intuition about what you tell them,
00:25:24,280 --> 00:25:26,720 what you do, when do you intervene, when do you not,
00:25:26,720 --> 00:25:28,760 it's complicated.
00:25:28,760 --> 00:25:29,800 Right, so.
00:25:29,800 --> 00:25:33,240 It's essentially computationally unsolvable.
00:25:33,240 --> 00:25:35,360 Yeah, it's an intractable problem, sure.
00:25:36,680 --> 00:25:38,000 Humans are a mess.
00:25:38,000 --> 00:25:41,840 But with deep understanding,
00:25:41,840 --> 00:25:44,600 do you mean also sort of fundamental questions
00:25:44,600 --> 00:25:49,600 of things like what is a computer?
00:25:51,400 --> 00:25:55,040 Or why, like the why questions,
00:25:55,040 --> 00:25:58,800 why are we even building this, like of purpose?
00:25:58,800 --> 00:26:02,240 Or do you mean more like going towards
00:26:02,240 --> 00:26:04,320 the fundamental limits of physics,
00:26:04,320 --> 00:26:07,280 sort of really getting into the core of the science?
00:26:07,280 --> 00:26:09,560 Well, in terms of building a computer,
00:26:09,560 --> 00:26:11,400 think a little simpler.
00:26:11,400 --> 00:26:14,680 So common practice is you build a computer,
00:26:14,680 --> 00:26:17,800 and then when somebody says I wanna make it 10% faster,
00:26:17,800 --> 00:26:19,280 you'll go in and say, all right,
00:26:19,280 --> 00:26:20,880 I need to make this buffer bigger,
00:26:20,880 --> 00:26:23,020 and maybe I'll add an add unit.
00:26:23,020 --> 00:26:25,400 Or I have this thing that's three instructions wide,
00:26:25,400 --> 00:26:27,640 I'm gonna make it four instructions wide.
00:26:27,640 --> 00:26:31,480 And what you see is each piece gets incrementally
00:26:31,480 --> 00:26:34,240 more complicated, right?
00:26:34,240 --> 00:26:37,120 And then at some point you hit this limit,
00:26:37,120 --> 00:26:39,740 like adding another feature or buffer doesn't seem
00:26:39,740 --> 00:26:41,220 to make it any faster.
00:26:41,220 --> 00:26:42,800 And then people will say, well, that's because
00:26:42,800 --> 00:26:45,420 it's a fundamental limit.
00:26:45,420 --> 00:26:46,960 And then somebody else will look at it and say,
00:26:46,960 --> 00:26:49,440 well, actually the way you divided the problem up,
00:26:49,440 --> 00:26:52,000 and the way the different features are interacting
00:26:52,000 --> 00:26:55,040 is limiting you, and it has to be rethought, rewritten.
00:26:56,280 --> 00:26:58,160 So then you refactor it and rewrite it,
00:26:58,160 --> 00:27:00,960 and what people commonly find is the rewrite
00:27:00,960 --> 00:27:03,600 is not only faster, but half as complicated.
00:27:03,600 --> 00:27:04,440 From scratch?
00:27:04,440 --> 00:27:05,260 Yes.
00:27:05,260 --> 00:27:08,920 So how often in your career, but just have you seen
00:27:08,920 --> 00:27:11,600 as needed, maybe more generally,
00:27:11,600 --> 00:27:14,760 to just throw the whole thing out and start over?
00:27:14,760 --> 00:27:17,080 This is where I'm on one end of it,
00:27:17,080 --> 00:27:19,160 every three to five years.
00:27:19,160 --> 00:27:21,120 Which end are you on?
00:27:21,120 --> 00:27:22,760 Rewrite more often.
00:27:22,760 --> 00:27:25,240 Rewrite, and three to five years is?
00:27:25,240 --> 00:27:27,020 If you wanna really make a lot of progress
00:27:27,020 --> 00:27:28,980 on computer architecture, every five years
00:27:28,980 --> 00:27:30,520 you should do one from scratch.
00:27:32,000 --> 00:27:36,960 So where does the x86-64 standard come in?
00:27:36,960 --> 00:27:38,800 How often do you?
00:27:38,800 --> 00:27:42,400 I wrote the, I was the co-author of that spec in 98.
00:27:42,400 --> 00:27:43,920 That's 20 years ago.
00:27:43,920 --> 00:27:45,920 Yeah, so that's still around.
00:27:45,920 --> 00:27:48,320 The instruction set itself has been extended
00:27:48,320 --> 00:27:50,040 quite a few times.
00:27:50,040 --> 00:27:52,520 And instruction sets are less interesting
00:27:52,520 --> 00:27:54,800 than the implementation underneath.
00:27:54,800 --> 00:27:58,720 There's been, on x86 architecture, Intel's designed a few,
00:27:58,720 --> 00:28:02,560 AIM's designed a few very different architectures.
00:28:02,560 --> 00:28:06,560 And I don't wanna go into too much of the detail
00:28:06,560 --> 00:28:10,680 about how often, but there's a tendency
00:28:10,680 --> 00:28:12,620 to rewrite it every 10 years,
00:28:12,620 --> 00:28:14,320 and it really should be every five.
00:28:15,240 --> 00:28:17,960 So you're saying you're an outlier in that sense in the.
00:28:17,960 --> 00:28:19,040 Rewrite more often.
00:28:19,040 --> 00:28:20,160 Rewrite more often.
00:28:20,160 --> 00:28:21,000 Well, and here's the problem.
00:28:21,000 --> 00:28:22,200 Isn't that scary?
00:28:22,200 --> 00:28:23,760 Yeah, of course.
00:28:23,760 --> 00:28:25,260 Well, scary to who?
00:28:25,260 --> 00:28:28,240 To everybody involved, because like you said,
00:28:28,240 --> 00:28:30,760 repeating the recipe is efficient.
00:28:30,760 --> 00:28:34,560 Companies wanna make money, no,
00:28:34,560 --> 00:28:36,400 individual engineers wanna succeed,
00:28:36,400 --> 00:28:39,080 so you wanna incrementally improve,
00:28:39,080 --> 00:28:41,360 increase the buffer from three to four.
00:28:41,360 --> 00:28:43,400 Well, this is where you get into
00:28:43,400 --> 00:28:45,500 diminishing return curves.
00:28:45,500 --> 00:28:47,000 I think Steve Jobs said this, right?
00:28:47,000 --> 00:28:49,940 So every, you have a project, and you start here,
00:28:49,940 --> 00:28:52,440 and it goes up, and they have diminishing return.
00:28:52,440 --> 00:28:54,840 And to get to the next level, you have to do a new one,
00:28:54,840 --> 00:28:57,720 and the initial starting point will be lower
00:28:57,720 --> 00:29:01,920 than the old optimization point, but it'll get higher.
00:29:01,920 --> 00:29:03,640 So now you have two kinds of fear,
00:29:03,640 --> 00:29:07,600 short-term disaster and long-term disaster.
00:29:07,600 --> 00:29:08,680 And you're, you're, you're haunted.
00:29:08,680 --> 00:29:11,160 So grown-ups, right?
00:29:11,160 --> 00:29:12,000 Yes.
00:29:12,000 --> 00:29:13,880 Like, you know, people with a quarter-by-quarter
00:29:13,880 --> 00:29:17,920 business objective are terrified about changing everything.
00:29:17,920 --> 00:29:21,120 And people who are trying to run a business
00:29:21,120 --> 00:29:24,040 or build a computer for a long-term objective
00:29:24,040 --> 00:29:26,640 know that the short-term limitations
00:29:26,640 --> 00:29:29,440 block them from the long-term success.
00:29:29,440 --> 00:29:32,800 So if you look at leaders of companies
00:29:32,800 --> 00:29:35,280 that had really good long-term success,
00:29:35,280 --> 00:29:39,080 every time they saw that they had to redo something, they did.
00:29:39,080 --> 00:29:41,120 And so somebody has to speak up.
00:29:41,120 --> 00:29:43,160 Or you do multiple projects in parallel.
00:29:43,160 --> 00:29:46,780 Like, you optimize the old one while you build a new one.
00:29:46,780 --> 00:29:48,240 But the marketing guys are always like,
00:29:48,240 --> 00:29:50,040 make, promise me that the new computer
00:29:50,040 --> 00:29:52,800 is faster on every single thing.
00:29:52,800 --> 00:29:53,980 And the computer architect says,
00:29:53,980 --> 00:29:56,800 well, the new computer will be faster on the average.
00:29:56,800 --> 00:29:59,540 But there's a distribution of results and performance,
00:29:59,540 --> 00:30:01,960 and you'll have some outliers that are slower.
00:30:01,960 --> 00:30:02,800 And that's very hard,
00:30:02,800 --> 00:30:05,320 because they have one customer who cares about that one.
00:30:05,320 --> 00:30:09,000 So speaking of the long-term, for over 50 years now,
00:30:09,000 --> 00:30:12,920 Moore's Law has served, for me and millions of others,
00:30:12,920 --> 00:30:16,680 as an inspiring beacon of what kind of amazing future
00:30:16,680 --> 00:30:18,080 brilliant engineers can build.
00:30:18,080 --> 00:30:19,400 Yep.
00:30:19,400 --> 00:30:21,880 I'm just making your kids laugh all of today.
00:30:21,880 --> 00:30:23,520 Yeah, that was great.
00:30:23,520 --> 00:30:27,600 So first, in your eyes, what is Moore's Law,
00:30:27,600 --> 00:30:29,960 if you could define for people who don't know?
00:30:29,960 --> 00:30:34,360 Well, the simple statement was, from Gordon Moore,
00:30:34,360 --> 00:30:37,960 was double the number of transistors every two years.
00:30:37,960 --> 00:30:39,400 Something like that.
00:30:39,400 --> 00:30:43,320 And then my operational model is,
00:30:43,320 --> 00:30:45,920 we increase the performance of computers
00:30:45,920 --> 00:30:48,600 by two X every two or three years.
00:30:48,600 --> 00:30:51,480 And it's wiggled around substantially over time.
00:30:51,480 --> 00:30:55,260 And also, in how we deliver, performance has changed.
00:30:55,260 --> 00:31:00,260 But the foundational idea was
00:31:00,500 --> 00:31:02,940 two X to transistors every two years.
00:31:02,940 --> 00:31:05,820 The current cadence is something like,
00:31:05,820 --> 00:31:08,020 they call it a shrink factor.
00:31:08,020 --> 00:31:11,980 Like 0.6 every two years, which is not 0.5.
00:31:11,980 --> 00:31:13,820 But that's referring strictly, again,
00:31:13,820 --> 00:31:15,340 to the original definition of just.
00:31:15,340 --> 00:31:16,700 Yeah, of transistor count.
00:31:16,700 --> 00:31:18,100 A shrink factor's just getting them
00:31:18,100 --> 00:31:19,060 smaller and smaller and smaller.
00:31:19,060 --> 00:31:21,780 Well, it's for a constant chip area.
00:31:21,780 --> 00:31:24,220 If you make the transistors smaller by 0.6,
00:31:24,220 --> 00:31:27,180 then you get one over 0.6 more transistors.
00:31:27,180 --> 00:31:29,140 So can you linger on it a little longer?
00:31:29,140 --> 00:31:31,660 What's a broader, what do you think should be
00:31:31,660 --> 00:31:33,920 the broader definition of Moore's Law?
00:31:33,920 --> 00:31:37,920 When you mentioned how you think of performance,
00:31:37,920 --> 00:31:41,500 just broadly, what's a good way to think about Moore's Law?
00:31:42,380 --> 00:31:45,580 Well, first of all, so I've been aware
00:31:45,580 --> 00:31:47,220 of Moore's Law for 30 years.
00:31:48,140 --> 00:31:49,100 In which sense?
00:31:49,100 --> 00:31:52,900 Well, I've been designing computers for 40.
00:31:52,900 --> 00:31:55,460 You're just watching it before your eyes kind of thing.
00:31:55,460 --> 00:31:58,180 Well, and somewhere where I became aware of it,
00:31:58,180 --> 00:31:59,780 I was also informed that Moore's Law
00:31:59,780 --> 00:32:02,260 was gonna die in 10 to 15 years.
00:32:02,260 --> 00:32:03,940 And then I thought that was true at first,
00:32:03,940 --> 00:32:07,260 but then after 10 years, it was gonna die in 10 to 15 years.
00:32:07,260 --> 00:32:09,780 And then at one point, it was gonna die in five years,
00:32:09,780 --> 00:32:11,320 and then it went back up to 10 years,
00:32:11,320 --> 00:32:13,420 and at some point, I decided not to worry
00:32:13,420 --> 00:32:16,660 about that particular prognostication
00:32:16,660 --> 00:32:19,620 for the rest of my life, which is fun.
00:32:19,620 --> 00:32:21,540 And then I joined Intel, and everybody said
00:32:21,540 --> 00:32:23,780 that Moore's Law is dead, and I thought that's sad
00:32:23,780 --> 00:32:25,700 because it's the Moore's Law company,
00:32:25,700 --> 00:32:29,260 and it's not dead, and it's always been gonna die.
00:32:29,260 --> 00:32:33,420 And humans like these apocryphal kind of statements
00:32:33,420 --> 00:32:36,340 like we'll run out of food, or we'll run out of air,
00:32:36,340 --> 00:32:40,060 or we'll run out of room, or we'll run out of something.
00:32:40,060 --> 00:32:42,020 Right, but it's still incredible
00:32:42,020 --> 00:32:44,700 that it's lived for as long as it has,
00:32:44,700 --> 00:32:47,740 and yes, there's many people who believe now
00:32:47,740 --> 00:32:50,260 that Moore's Law is dead.
00:32:50,260 --> 00:32:52,900 I know, they can join the last 50 years
00:32:52,900 --> 00:32:53,740 of people who had the same idea.
00:32:53,740 --> 00:32:55,460 Yeah, there's a long tradition,
00:32:55,460 --> 00:33:00,460 but why do you think, if you can try to understand it,
00:33:00,900 --> 00:33:03,940 why do you think it's not dead currently?
00:33:03,940 --> 00:33:07,140 Let's just think, people think Moore's Law is one thing,
00:33:07,140 --> 00:33:10,260 transistors get smaller, but actually under the sheet,
00:33:10,260 --> 00:33:12,580 there's literally thousands of innovations,
00:33:12,580 --> 00:33:14,200 and almost all those innovations
00:33:14,200 --> 00:33:17,440 have their own diminishing return curves.
00:33:17,440 --> 00:33:19,460 So if you graph it, it looks like a cascade
00:33:19,460 --> 00:33:21,500 of diminishing return curves.
00:33:21,500 --> 00:33:22,740 I don't know what to call that,
00:33:22,740 --> 00:33:26,540 but the result is an exponential curve,
00:33:26,540 --> 00:33:28,020 but at least it has been.
00:33:28,020 --> 00:33:30,980 So, and we keep inventing new things,
00:33:30,980 --> 00:33:33,020 so if you're an expert in one of the things
00:33:33,020 --> 00:33:36,020 on a diminishing return curve, right,
00:33:36,020 --> 00:33:38,540 and you can see its plateau,
00:33:38,540 --> 00:33:42,300 you will probably tell people, well, this is done.
00:33:42,300 --> 00:33:43,740 Meanwhile, some other pile of people
00:33:43,740 --> 00:33:46,460 are doing something different.
00:33:46,460 --> 00:33:48,340 So that's just normal.
00:33:48,340 --> 00:33:51,340 So then there's the observation of how small
00:33:51,340 --> 00:33:54,100 could a switching device be?
00:33:54,100 --> 00:33:55,820 So a modern transistor is something like
00:33:55,820 --> 00:33:59,940 a thousand by a thousand by a thousand atoms, right?
00:33:59,940 --> 00:34:04,700 And you get quantum effects down around two to 10 atoms.
00:34:04,700 --> 00:34:06,300 So you can imagine the transistor
00:34:06,300 --> 00:34:08,260 as small as 10 by 10 by 10.
00:34:08,260 --> 00:34:12,140 So that's a million times smaller.
00:34:12,140 --> 00:34:14,540 And then the quantum computational people
00:34:14,540 --> 00:34:17,500 are working away at how to use quantum effects.
00:34:17,500 --> 00:34:18,340 So.
00:34:20,020 --> 00:34:21,940 A thousand by a thousand by a thousand.
00:34:21,940 --> 00:34:22,780 Atoms.
00:34:23,780 --> 00:34:26,700 That's a really clean way of putting it.
00:34:26,700 --> 00:34:28,900 Well, a fan, like a modern transistor,
00:34:28,900 --> 00:34:32,100 if you look at the fan, it's like 120 atoms wide,
00:34:32,100 --> 00:34:33,380 but we can make that thinner,
00:34:33,380 --> 00:34:35,740 and then there's a gate wrapped around it,
00:34:35,740 --> 00:34:36,660 and then there's spacing.
00:34:36,660 --> 00:34:38,820 There's a whole bunch of geometry.
00:34:38,820 --> 00:34:42,060 And a competent transistor designer
00:34:42,060 --> 00:34:48,060 could count both atoms in every single direction.
00:34:48,060 --> 00:34:50,540 Like there's techniques now to already put down atoms
00:34:50,540 --> 00:34:53,140 in a single atomic layer.
00:34:53,140 --> 00:34:55,900 And you can place atoms if you want to.
00:34:55,900 --> 00:34:59,660 It's just, you know, from a manufacturing process,
00:34:59,660 --> 00:35:01,380 if placing an atom takes 10 minutes
00:35:01,380 --> 00:35:05,700 and you need to put 10 to the 23rd atoms together
00:35:05,700 --> 00:35:08,860 to make a computer, it would take a long time.
00:35:08,860 --> 00:35:13,380 So the methods are both shrinking things,
00:35:13,380 --> 00:35:15,100 and then coming up with effective ways
00:35:15,100 --> 00:35:17,940 to control what's happening.
00:35:17,940 --> 00:35:20,100 Manufacture stably and cheaply.
00:35:20,100 --> 00:35:21,420 Yeah.
00:35:21,420 --> 00:35:23,540 So the innovation stock's pretty broad.
00:35:23,540 --> 00:35:26,060 You know, there's equipment, there's optics,
00:35:26,060 --> 00:35:27,620 there's chemistry, there's physics,
00:35:27,620 --> 00:35:31,100 there's material science, there's metallurgy.
00:35:31,100 --> 00:35:32,260 There's lots of ideas about
00:35:32,260 --> 00:35:33,740 when you put different materials together,
00:35:33,740 --> 00:35:35,580 how do they interact, are they stable?
00:35:35,580 --> 00:35:37,980 Are they stable over temperature?
00:35:37,980 --> 00:35:40,580 You know, like are they repeatable?
00:35:40,580 --> 00:35:43,100 You know, there's like literally
00:35:43,100 --> 00:35:45,020 thousands of technologies involved.
00:35:45,020 --> 00:35:46,300 But just for the shrinking,
00:35:46,300 --> 00:35:48,580 you don't think we're quite yet close
00:35:48,580 --> 00:35:50,980 to the fundamental limits of physics?
00:35:50,980 --> 00:35:52,180 I did a talk on Moore's Law
00:35:52,180 --> 00:35:54,900 and I asked for a roadmap to a path of 100,
00:35:54,900 --> 00:35:58,900 and after two weeks, they said, we only got to 50.
00:35:58,900 --> 00:35:59,820 100 what, sorry?
00:35:59,820 --> 00:36:00,660 100X shrink.
00:36:00,660 --> 00:36:01,980 100X shrink?
00:36:01,980 --> 00:36:02,820 We only got to 50.
00:36:02,820 --> 00:36:05,500 To 50, and I said, why don't you give it another two weeks?
00:36:05,500 --> 00:36:09,740 Well, here's the thing about Moore's Law, right?
00:36:09,740 --> 00:36:14,260 So I believe that the next 10 or 20 years
00:36:14,260 --> 00:36:16,460 of shrinking is gonna happen, right?
00:36:16,460 --> 00:36:21,020 Now, as a computer designer, you have two stances.
00:36:21,020 --> 00:36:22,540 You think it's going to shrink,
00:36:22,540 --> 00:36:24,860 in which case you're designing
00:36:24,860 --> 00:36:26,260 and thinking about architecture
00:36:26,260 --> 00:36:29,100 in a way that you'll use more transistors.
00:36:29,100 --> 00:36:32,940 Or conversely, not be swamped by the complexity
00:36:32,940 --> 00:36:36,220 of all the transistors you get, right?
00:36:36,220 --> 00:36:39,380 You have to have a strategy, you know?
00:36:39,380 --> 00:36:41,580 So you're open to the possibility
00:36:41,580 --> 00:36:43,100 and waiting for the possibility
00:36:43,100 --> 00:36:46,020 of a whole new army of transistors ready to work.
00:36:46,020 --> 00:36:50,460 I'm expecting more transistors every two or three years
00:36:50,460 --> 00:36:54,420 by a number large enough that how you think about design,
00:36:54,420 --> 00:36:57,260 how you think about architecture has to change.
00:36:57,260 --> 00:37:01,180 Like imagine you build buildings out of bricks,
00:37:01,180 --> 00:37:03,340 and every year the bricks are half the size,
00:37:04,580 --> 00:37:05,940 or every two years.
00:37:05,940 --> 00:37:08,180 Well, if you kept building bricks the same way,
00:37:08,180 --> 00:37:11,340 you know, so many bricks per person per day,
00:37:11,340 --> 00:37:13,660 the amount of time to build a building
00:37:13,660 --> 00:37:17,060 would go up exponentially, right?
00:37:17,060 --> 00:37:19,260 But if you said, I know that's coming,
00:37:19,260 --> 00:37:21,260 so now I'm gonna design equipment
00:37:21,260 --> 00:37:23,540 that moves bricks faster, uses them better,
00:37:23,540 --> 00:37:24,540 because maybe you're getting something
00:37:24,540 --> 00:37:27,580 out of the smaller bricks, more strength, thinner walls,
00:37:27,580 --> 00:37:30,420 you know, less material efficiency out of that.
00:37:30,420 --> 00:37:33,340 So once you have a roadmap with what's gonna happen,
00:37:33,340 --> 00:37:34,960 transistors, they're gonna get,
00:37:34,960 --> 00:37:36,600 we're gonna get more of them,
00:37:36,600 --> 00:37:38,840 then you design all this collateral around it
00:37:38,840 --> 00:37:42,500 to take advantage of it, and also to cope with it.
00:37:42,500 --> 00:37:43,820 Like that's the thing people don't understand,
00:37:43,820 --> 00:37:46,200 it's like, if I didn't believe in Moore's law,
00:37:46,200 --> 00:37:48,820 and then Moore's law transistors showed up,
00:37:48,820 --> 00:37:50,560 my design teams were all drowned.
00:37:51,820 --> 00:37:54,380 So what's the, what's the hardest part
00:37:54,380 --> 00:37:57,420 of this inflow of new transistors?
00:37:57,420 --> 00:37:59,540 I mean, even if you just look historically
00:37:59,540 --> 00:38:03,780 throughout your career, what's the thing,
00:38:03,780 --> 00:38:07,020 what fundamentally changes when you add more transistors
00:38:07,020 --> 00:38:10,820 in the task of designing an architecture?
00:38:10,820 --> 00:38:12,540 Well, there's two constants, right?
00:38:12,540 --> 00:38:14,180 One is people don't get smarter.
00:38:16,140 --> 00:38:17,340 By the way, there's some science showing
00:38:17,340 --> 00:38:20,340 that we do get smarter, because of nutrition, whatever.
00:38:21,260 --> 00:38:22,100 Sorry to bring that up.
00:38:22,100 --> 00:38:22,940 Plant effect.
00:38:22,940 --> 00:38:23,760 Yes.
00:38:23,760 --> 00:38:24,600 Yeah, I'm familiar with it.
00:38:24,600 --> 00:38:25,420 Nobody understands it, nobody knows
00:38:25,420 --> 00:38:27,180 if it's still going on, so that's a.
00:38:27,180 --> 00:38:28,540 Or whether it's real or not.
00:38:28,540 --> 00:38:30,220 But yeah, that's a.
00:38:30,220 --> 00:38:31,300 I sort of.
00:38:31,300 --> 00:38:32,140 Anyway, but not exponentially.
00:38:32,140 --> 00:38:33,480 I would believe for the most part,
00:38:33,480 --> 00:38:35,540 people aren't getting much smarter.
00:38:35,540 --> 00:38:37,540 The evidence doesn't support it, that's right.
00:38:37,540 --> 00:38:40,100 And then teams can't grow that much.
00:38:40,100 --> 00:38:40,940 Right.
00:38:40,940 --> 00:38:43,420 Right, so human beings, you know,
00:38:43,420 --> 00:38:45,780 we're really good in teams of 10,
00:38:45,780 --> 00:38:47,260 you know, up to teams of 100,
00:38:47,260 --> 00:38:48,700 they can know each other, beyond that,
00:38:48,700 --> 00:38:50,840 you have to have organizational boundaries.
00:38:50,840 --> 00:38:51,940 So you're kind of, you have,
00:38:51,940 --> 00:38:54,680 those are pretty hard constraints, right?
00:38:54,680 --> 00:38:56,420 So then you have to divide and conquer,
00:38:56,420 --> 00:38:57,940 like as the designs get bigger,
00:38:57,940 --> 00:39:00,300 you have to divide it into pieces.
00:39:00,300 --> 00:39:03,260 You know, the power of abstraction layers is really high.
00:39:03,260 --> 00:39:06,160 We used to build computers out of transistors.
00:39:06,160 --> 00:39:08,140 Now we have a team that turns transistors
00:39:08,140 --> 00:39:09,420 into logic cells, and another team
00:39:09,420 --> 00:39:10,740 that turns them into functional units,
00:39:10,740 --> 00:39:13,060 another one that turns them into computers.
00:39:13,060 --> 00:39:16,140 Right, so we have abstraction layers in there.
00:39:16,140 --> 00:39:18,980 And you have to think about
00:39:18,980 --> 00:39:21,420 when do you shift gears on that.
00:39:21,420 --> 00:39:24,380 We also use faster computers to build faster computers.
00:39:24,380 --> 00:39:27,860 So some algorithms run twice as fast on new computers.
00:39:27,860 --> 00:39:30,500 But a lot of algorithms are N squared.
00:39:30,500 --> 00:39:33,620 So, you know, a computer with twice as many transistors,
00:39:33,620 --> 00:39:36,580 and it might take four times as long to run,
00:39:36,580 --> 00:39:39,420 so you have to refactor the software.
00:39:39,420 --> 00:39:41,060 Like simply using faster computers
00:39:41,060 --> 00:39:43,120 to build bigger computers doesn't work.
00:39:44,220 --> 00:39:46,300 So you have to think about all these things.
00:39:46,300 --> 00:39:47,940 So in terms of computing performance
00:39:47,940 --> 00:39:49,340 and the exciting possibility
00:39:49,340 --> 00:39:51,620 that more powerful computers bring,
00:39:51,620 --> 00:39:55,240 is shrinking the thing which you've been talking about,
00:39:55,240 --> 00:39:59,160 one of the, for you, one of the biggest exciting
00:39:59,160 --> 00:40:01,520 possibilities of advancement in performance?
00:40:01,520 --> 00:40:03,920 Or is there other directions that you're interested in,
00:40:03,920 --> 00:40:08,920 like in the direction of sort of enforcing given parallelism
00:40:09,380 --> 00:40:12,240 or like doing massive parallelism
00:40:12,240 --> 00:40:15,040 in terms of many, many CPUs,
00:40:15,040 --> 00:40:17,700 you know, stacking CPUs on top of each other,
00:40:17,700 --> 00:40:20,800 that kind of parallelism, or any kind of parallelism?
00:40:20,800 --> 00:40:22,240 Well, think about it a different way.
00:40:22,240 --> 00:40:25,240 So old computers, you know, slow computers,
00:40:25,240 --> 00:40:28,520 you said A equal B plus C times D.
00:40:28,520 --> 00:40:30,620 Pretty simple, right?
00:40:30,620 --> 00:40:33,520 And then we made faster computers with vector units,
00:40:33,520 --> 00:40:38,520 and you can do proper equations and matrices, right?
00:40:38,520 --> 00:40:41,120 And then modern like AI computations,
00:40:41,120 --> 00:40:43,440 or like convolutional neural networks,
00:40:43,440 --> 00:40:47,120 where you convolve one large data set against another.
00:40:47,120 --> 00:40:51,180 And so there's sort of this hierarchy of mathematics,
00:40:51,180 --> 00:40:54,080 you know, from simple equation to linear equations
00:40:54,080 --> 00:40:58,800 to matrix equations to deeper kind of computation.
00:40:58,800 --> 00:41:00,640 And the data sets are getting so big
00:41:00,640 --> 00:41:04,400 that people are thinking of data as a topology problem.
00:41:04,400 --> 00:41:08,000 You know, data is organized in some immense shape.
00:41:08,000 --> 00:41:11,200 And then the computation, which sort of wants to be,
00:41:11,200 --> 00:41:15,360 get data from immense shape and do some computation on it.
00:41:15,360 --> 00:41:18,160 So what computers have allowed people to do
00:41:18,160 --> 00:41:21,440 is have algorithms go much, much further.
00:41:22,500 --> 00:41:26,680 So that paper you reference, the Sutton paper,
00:41:26,680 --> 00:41:29,140 they talked about, you know, like when AI started,
00:41:29,140 --> 00:41:31,900 it was apply rule sets to something.
00:41:31,900 --> 00:41:35,820 That's a very simple computational situation.
00:41:35,820 --> 00:41:37,880 And then when they did first chess thing,
00:41:37,880 --> 00:41:39,920 they solved deep searches.
00:41:39,920 --> 00:41:44,700 So have a huge database of moves and results, deep search,
00:41:44,700 --> 00:41:48,180 but it's still just a search, right?
00:41:48,180 --> 00:41:51,160 Now we take large numbers of images
00:41:51,160 --> 00:41:54,400 and we use it to train these weight sets
00:41:54,400 --> 00:41:56,280 that we convolve across.
00:41:56,280 --> 00:41:58,920 It's a completely different kind of phenomena.
00:41:58,920 --> 00:41:59,760 We call that AI.
00:41:59,760 --> 00:42:02,460 And now they're doing the next generation.
00:42:02,460 --> 00:42:03,840 And if you look at it,
00:42:03,840 --> 00:42:07,600 they're going up this mathematical graph, right?
00:42:07,600 --> 00:42:11,240 And then computations, both computation and data sets
00:42:11,240 --> 00:42:13,980 support going up that graph.
00:42:13,980 --> 00:42:15,520 Yeah, the kind of computation that might,
00:42:15,520 --> 00:42:18,760 I mean, I would argue that all of it is still a search,
00:42:18,760 --> 00:42:20,040 right?
00:42:20,040 --> 00:42:22,800 Just like you said, a topology problem of data sets,
00:42:22,800 --> 00:42:27,080 you're searching the data sets for valuable data.
00:42:27,080 --> 00:42:30,040 And also the actual optimization of neural networks
00:42:30,040 --> 00:42:33,080 is a kind of search for the-
00:42:33,080 --> 00:42:34,800 I don't know if you had looked at the interlayers
00:42:34,800 --> 00:42:39,120 of finding a cat, it's not a search.
00:42:39,120 --> 00:42:41,160 It's a set of endless projections.
00:42:41,160 --> 00:42:45,720 So, projection, here's a shadow of this phone, right?
00:42:45,720 --> 00:42:47,760 And then you can have a shadow of that on something
00:42:47,760 --> 00:42:49,320 and a shadow on that of something.
00:42:49,320 --> 00:42:50,480 And if you look in the layers,
00:42:50,480 --> 00:42:53,660 you'll see this layer actually describes pointy ears
00:42:53,660 --> 00:42:56,640 and round eyedness and fuzziness.
00:42:56,640 --> 00:43:01,640 But the computation to tease out the attributes
00:43:02,080 --> 00:43:03,760 is not search.
00:43:03,760 --> 00:43:04,600 Right, I mean-
00:43:04,600 --> 00:43:06,040 Like the inference part might be search,
00:43:06,040 --> 00:43:08,000 but the training is not search.
00:43:08,000 --> 00:43:10,840 And then in deep networks, they look at layers
00:43:10,840 --> 00:43:13,200 and they don't even know it's represented.
00:43:14,360 --> 00:43:16,680 And yet if you take the layers out, it doesn't work.
00:43:16,680 --> 00:43:17,520 Okay, so-
00:43:17,520 --> 00:43:18,960 So I don't think it's search.
00:43:18,960 --> 00:43:19,800 All right, well-
00:43:19,800 --> 00:43:21,080 But you'd have to talk to a mathematician
00:43:21,080 --> 00:43:23,000 about what that actually is.
00:43:23,000 --> 00:43:27,020 Well, we could disagree, but it's just semantics,
00:43:27,020 --> 00:43:29,160 I think it's not, but it's certainly not-
00:43:29,160 --> 00:43:31,960 I would say it's absolutely not semantics, but-
00:43:31,960 --> 00:43:35,480 Okay, all right, well, if you want to go there.
00:43:37,080 --> 00:43:39,060 So optimization to me is search,
00:43:39,060 --> 00:43:43,020 and we're trying to optimize the ability
00:43:43,020 --> 00:43:45,880 of a neural network to detect cat ears.
00:43:45,880 --> 00:43:50,880 And the difference between chess and the space,
00:43:51,120 --> 00:43:54,200 the incredibly multidimensional,
00:43:54,200 --> 00:43:57,440 100,000 dimensional space that neural networks
00:43:57,440 --> 00:44:00,280 are trying to optimize over is nothing like
00:44:00,280 --> 00:44:02,280 the chess board database.
00:44:02,280 --> 00:44:04,360 So it's a totally different kind of thing.
00:44:04,360 --> 00:44:06,280 And okay, in that sense, you can say-
00:44:06,280 --> 00:44:07,120 Yeah, yeah.
00:44:07,120 --> 00:44:07,960 It loses the meaning.
00:44:07,960 --> 00:44:11,280 I didn't see how you might say, if you-
00:44:11,280 --> 00:44:12,840 The funny thing is it's the difference
00:44:12,840 --> 00:44:16,560 between given search space and found search space.
00:44:16,560 --> 00:44:17,400 Right, exactly.
00:44:17,400 --> 00:44:18,800 Yeah, maybe that's a different way to describe it.
00:44:18,800 --> 00:44:20,000 That's a beautiful way to put it, okay.
00:44:20,000 --> 00:44:21,760 But you're saying, what's your sense
00:44:21,760 --> 00:44:24,840 in terms of the basic mathematical operations
00:44:24,840 --> 00:44:27,840 and the architectures, computer hardware
00:44:27,840 --> 00:44:29,960 that enables those operations?
00:44:29,960 --> 00:44:33,040 Do you see the CPUs of today still being
00:44:33,040 --> 00:44:36,040 a really core part of executing
00:44:36,040 --> 00:44:37,680 those mathematical operations?
00:44:37,680 --> 00:44:38,600 Yes.
00:44:38,600 --> 00:44:42,320 Well, the operations continue to be add, subtract,
00:44:42,320 --> 00:44:44,680 load, store, compare, and branch.
00:44:44,680 --> 00:44:46,160 It's remarkable.
00:44:46,160 --> 00:44:48,880 So it's interesting that the building blocks
00:44:48,880 --> 00:44:52,760 of computers or transistors under that atoms.
00:44:52,760 --> 00:44:54,680 So you've got atoms, transistors, logic gates,
00:44:54,680 --> 00:44:58,400 computers, functional units, and computers.
00:44:58,400 --> 00:45:01,040 The building blocks of mathematics at some level
00:45:01,040 --> 00:45:04,480 are things like adds and subtracts and multiplies,
00:45:04,480 --> 00:45:08,400 but the space mathematics can describe
00:45:08,400 --> 00:45:11,280 is, I think, essentially infinite.
00:45:11,280 --> 00:45:14,120 But the computers that run the algorithms
00:45:14,120 --> 00:45:16,680 are still doing the same things.
00:45:16,680 --> 00:45:19,040 Now, a given algorithm might say,
00:45:19,040 --> 00:45:21,960 I need sparse data, or I need 32-bit data,
00:45:21,960 --> 00:45:26,400 or I need, you know, like a convolution operation
00:45:26,400 --> 00:45:29,000 that naturally takes 8-bit data,
00:45:29,000 --> 00:45:31,680 multiplies it and sums it up a certain way.
00:45:31,680 --> 00:45:36,680 So the data types in TensorFlow imply an optimization set,
00:45:38,240 --> 00:45:40,480 but when you go right down and look at the computers,
00:45:40,480 --> 00:45:42,880 it's and and or gates doing adds and multiplies.
00:45:42,880 --> 00:45:46,240 Like, that hasn't changed much.
00:45:46,240 --> 00:45:48,600 Now, the quantum researchers think
00:45:48,600 --> 00:45:50,000 they're gonna change that radically,
00:45:50,000 --> 00:45:52,320 and then there's people who think about analog computing
00:45:52,320 --> 00:45:53,160 because you look in the brain
00:45:53,160 --> 00:45:56,360 and it seems to be more analogish, you know,
00:45:56,360 --> 00:45:59,120 that maybe there's a way to do that more efficiently.
00:45:59,120 --> 00:46:03,480 But we have a million X on computation,
00:46:03,480 --> 00:46:08,480 and I don't know the relationship between computational,
00:46:09,280 --> 00:46:10,960 let's say intensity,
00:46:10,960 --> 00:46:14,440 and ability to hit mathematical abstractions.
00:46:15,240 --> 00:46:17,400 I don't know any way to describe that,
00:46:17,400 --> 00:46:19,800 but just like you saw in AI,
00:46:19,800 --> 00:46:23,000 you went from rule sets to simple search
00:46:23,000 --> 00:46:26,400 to complex search to, say, found search.
00:46:26,400 --> 00:46:31,400 Like, those are orders of magnitude more computation to do.
00:46:31,520 --> 00:46:34,640 And as we get the next two orders of magnitude,
00:46:34,640 --> 00:46:36,440 like a friend, Roger Gaduri, said,
00:46:36,440 --> 00:46:39,120 like every order of magnitude changes the computation.
00:46:40,160 --> 00:46:42,680 Fundamentally changes what the computation is doing.
00:46:42,680 --> 00:46:43,520 Yeah.
00:46:44,680 --> 00:46:45,640 Oh, you know the expression,
00:46:45,640 --> 00:46:48,240 the difference in quantity is the difference in kind.
00:46:49,480 --> 00:46:53,000 You know, the difference between ant and anthill, right?
00:46:53,000 --> 00:46:54,640 Or neuron and brain.
00:46:54,640 --> 00:46:58,880 You know, there's this indefinable place
00:46:58,880 --> 00:47:02,480 where the quantity changed the quality, right?
00:47:02,480 --> 00:47:05,000 And we've seen that happen in mathematics multiple times.
00:47:05,000 --> 00:47:08,560 And, you know, my guess is it's gonna keep happening.
00:47:08,560 --> 00:47:09,960 So your sense is, yeah,
00:47:09,960 --> 00:47:14,880 if you focus head down and shrinking the transistor.
00:47:14,880 --> 00:47:15,720 Well, it's not just head down,
00:47:15,720 --> 00:47:18,360 and we're aware of the software stacks
00:47:18,360 --> 00:47:20,400 that are running in the computational loads.
00:47:20,400 --> 00:47:22,040 And we're kind of pondering,
00:47:22,040 --> 00:47:24,520 what do you do with a petabyte of memory
00:47:24,520 --> 00:47:27,080 that wants to be accessed in a sparse way?
00:47:27,080 --> 00:47:28,160 And have, you know,
00:47:28,160 --> 00:47:31,760 the kind of calculations AI programmers want.
00:47:32,680 --> 00:47:34,760 So there's a dialogue interaction.
00:47:34,760 --> 00:47:37,120 But when you go in the computer chip,
00:47:38,080 --> 00:47:41,560 you know, you find adders and subtractors and multipliers.
00:47:43,080 --> 00:47:44,840 So if you zoom out then with,
00:47:44,840 --> 00:47:46,920 as you mentioned, Rich Sutton,
00:47:46,920 --> 00:47:49,280 the idea that most of the development
00:47:49,280 --> 00:47:51,520 in the last many decades in AI research
00:47:51,520 --> 00:47:54,320 came from just leveraging computation
00:47:54,320 --> 00:47:57,920 and just the simple algorithms
00:47:57,920 --> 00:48:00,080 waiting for the computation to improve.
00:48:00,080 --> 00:48:02,000 Well, software guys have a thing
00:48:02,000 --> 00:48:06,080 that they call it the problem of early optimization.
00:48:07,120 --> 00:48:09,200 So if you write a big software stack,
00:48:09,200 --> 00:48:10,720 and if you start optimizing,
00:48:10,720 --> 00:48:12,440 like the first thing you write,
00:48:12,440 --> 00:48:15,480 the odds of that being the performance limiter is low.
00:48:15,480 --> 00:48:16,960 But when you get the whole thing working,
00:48:16,960 --> 00:48:19,800 can you make it 2X faster by optimizing the right things?
00:48:19,800 --> 00:48:21,080 Sure.
00:48:21,080 --> 00:48:22,560 While you're optimizing that,
00:48:22,560 --> 00:48:24,400 could you have written a new software stack
00:48:24,400 --> 00:48:26,080 which would have been a better choice?
00:48:26,080 --> 00:48:27,160 Maybe.
00:48:27,160 --> 00:48:28,600 Now you have creative tension.
00:48:29,560 --> 00:48:30,400 So.
00:48:30,400 --> 00:48:33,200 But the whole time as you're doing the writing,
00:48:33,200 --> 00:48:34,920 that's the software we're talking about.
00:48:34,920 --> 00:48:36,880 The hardware underneath gets faster and faster.
00:48:36,880 --> 00:48:38,200 Well, this goes back to the Moore's Law.
00:48:38,200 --> 00:48:40,000 If Moore's Law is gonna continue,
00:48:40,000 --> 00:48:45,000 then your AI research should expect that to show up.
00:48:45,880 --> 00:48:48,720 And then you make a slightly different set of choices then.
00:48:48,720 --> 00:48:49,880 We've hit the wall.
00:48:49,880 --> 00:48:51,400 Nothing's gonna happen.
00:48:51,400 --> 00:48:55,040 And from here, it's just us rewriting algorithms.
00:48:55,040 --> 00:48:56,560 Like that seems like a failed strategy
00:48:56,560 --> 00:48:59,760 for the last 30 years of Moore's Law's death.
00:48:59,760 --> 00:49:00,600 So.
00:49:00,600 --> 00:49:02,080 So can you just linger on it?
00:49:03,280 --> 00:49:04,560 I think you've answered it,
00:49:04,560 --> 00:49:07,040 but I'll just ask the same dumb question over and over.
00:49:07,040 --> 00:49:12,040 So why do you think Moore's Law is not going to die?
00:49:12,560 --> 00:49:15,800 Which is the most promising, exciting possibility
00:49:15,800 --> 00:49:18,120 of why it won't die in the next five, 10 years?
00:49:18,120 --> 00:49:20,760 So is it the continuous shrinking of the transistor,
00:49:20,760 --> 00:49:24,040 or is it another S-curve that steps in
00:49:24,040 --> 00:49:25,560 and it totally sort of-
00:49:25,560 --> 00:49:27,560 Well, the shrinking of the transistor
00:49:27,560 --> 00:49:30,240 is literally thousands of innovations.
00:49:30,240 --> 00:49:31,280 Right, so there's-
00:49:31,280 --> 00:49:34,840 So there's a whole bunch of S-curves
00:49:34,840 --> 00:49:36,600 just kind of running their course
00:49:36,600 --> 00:49:40,840 and being reinvented and new things.
00:49:40,840 --> 00:49:45,640 The semiconductor fabricators and technologists
00:49:45,640 --> 00:49:47,480 have all announced what's called nanowires.
00:49:47,480 --> 00:49:51,240 So they took a fan which had a gate around it
00:49:51,240 --> 00:49:52,760 and turned that into little wires
00:49:52,760 --> 00:49:55,440 so you have better control of that and they're smaller.
00:49:55,440 --> 00:49:57,320 And then from there, there are some obvious steps
00:49:57,320 --> 00:49:59,440 about how to shrink that.
00:49:59,440 --> 00:50:03,720 So the metallurgy around wire stacks and stuff
00:50:03,720 --> 00:50:07,240 has very obvious abilities to shrink.
00:50:07,240 --> 00:50:11,040 And there's a whole combination of things there to do.
00:50:11,040 --> 00:50:13,560 Your sense is that we're going to get a lot
00:50:13,560 --> 00:50:16,760 if this innovation performed just that shrinking.
00:50:16,760 --> 00:50:19,480 Yeah, like a factor of a hundred is a lot.
00:50:19,480 --> 00:50:22,200 Yeah, I would say that's incredible.
00:50:22,200 --> 00:50:23,800 And it's totally-
00:50:23,800 --> 00:50:25,200 It's only 10 or 15 years.
00:50:25,200 --> 00:50:26,480 Now you're smarter, you might know,
00:50:26,480 --> 00:50:28,320 but to me it's totally unpredictable
00:50:28,320 --> 00:50:29,800 what that hundred X would bring
00:50:29,800 --> 00:50:33,440 in terms of the nature of the computation
00:50:33,440 --> 00:50:34,480 that people would be-
00:50:34,480 --> 00:50:37,320 Yeah, are you familiar with Bell's Law?
00:50:37,320 --> 00:50:39,480 So for a long time, it was mainframes,
00:50:39,480 --> 00:50:42,560 minis, workstation, PC, mobile.
00:50:42,560 --> 00:50:46,280 Moore's Law drove faster, smaller computers, right?
00:50:46,280 --> 00:50:49,560 And then when we were thinking about Moore's Law,
00:50:49,560 --> 00:50:53,360 Rajagaduri said, every 10 X generates a new computation.
00:50:53,360 --> 00:50:58,360 So scalar, vector, matrix, topological computation, right?
00:51:01,120 --> 00:51:03,880 And if you go look at the industry trends,
00:51:03,880 --> 00:51:07,440 there was mainframes and then minicomputers and then PCs,
00:51:07,440 --> 00:51:08,960 and then the internet took off.
00:51:08,960 --> 00:51:10,800 And then we got mobile devices
00:51:10,800 --> 00:51:12,720 and now we're building 5G wireless
00:51:12,720 --> 00:51:14,840 with one millisecond latency.
00:51:14,840 --> 00:51:17,160 And people are starting to think about the smart world
00:51:17,160 --> 00:51:21,840 where everything knows you, recognizes you.
00:51:21,840 --> 00:51:26,840 Like the transformations are gonna be like unpredictable.
00:51:27,400 --> 00:51:28,960 How does it make you feel
00:51:28,960 --> 00:51:33,520 that you're one of the key architects
00:51:33,520 --> 00:51:35,240 of this kind of future?
00:51:35,240 --> 00:51:37,200 So we're not talking about the architects
00:51:37,200 --> 00:51:42,200 of the high level people who build the Angry Bird apps
00:51:42,320 --> 00:51:43,160 and Snapchat-
00:51:43,160 --> 00:51:45,280 From those Angry Bird apps, who knows?
00:51:45,280 --> 00:51:47,120 Maybe that's the whole point of the universe.
00:51:47,120 --> 00:51:48,880 I'm gonna take a stand at that
00:51:48,880 --> 00:51:52,840 and the attention distracting nature of mobile phones.
00:51:52,840 --> 00:51:53,800 I'll take a stand.
00:51:53,800 --> 00:51:54,640 But anyway, in terms of-
00:51:54,640 --> 00:51:56,400 I don't think that matters much.
00:51:57,600 --> 00:52:01,280 The side effects of smartphones
00:52:01,280 --> 00:52:03,760 or the attention distraction, which part?
00:52:03,760 --> 00:52:06,160 Well, who knows where this is all leading?
00:52:06,160 --> 00:52:07,440 It's changing so fast.
00:52:07,440 --> 00:52:08,280 Wait, so back to the-
00:52:08,280 --> 00:52:09,760 My parents used to yell at my sisters
00:52:09,760 --> 00:52:11,480 for hiding in the closet with a wired phone
00:52:11,480 --> 00:52:13,160 with a dial on it.
00:52:13,160 --> 00:52:14,760 Stop talking to your friends all day.
00:52:14,760 --> 00:52:15,800 Right.
00:52:15,800 --> 00:52:17,280 Now my wife yells at my kids
00:52:17,280 --> 00:52:20,440 for talking to their friends all day on text.
00:52:20,440 --> 00:52:21,800 Well, it looks the same to me.
00:52:21,800 --> 00:52:23,440 It's always, it's echoes of the same thing.
00:52:23,440 --> 00:52:26,720 Okay, but you are one of the key people
00:52:26,720 --> 00:52:29,200 architecting the hardware of this future.
00:52:29,200 --> 00:52:30,560 How does that make you feel?
00:52:30,560 --> 00:52:31,840 Do you feel responsible?
00:52:33,600 --> 00:52:34,960 Do you feel excited?
00:52:36,080 --> 00:52:38,160 So we're in a social context.
00:52:38,160 --> 00:52:40,960 So there's billions of people on this planet.
00:52:40,960 --> 00:52:42,920 There are literally millions of people
00:52:42,920 --> 00:52:44,480 working on technology.
00:52:45,360 --> 00:52:49,920 I feel lucky to be doing what I do
00:52:49,920 --> 00:52:50,920 and getting paid for it.
00:52:50,920 --> 00:52:52,880 And there's an interest in it,
00:52:52,880 --> 00:52:55,360 but there's so many things going on in parallel.
00:52:55,360 --> 00:52:58,400 It's like the actions are so unpredictable.
00:52:58,400 --> 00:53:01,240 If I wasn't here, somebody else would do it.
00:53:01,240 --> 00:53:03,480 The vectors of all these different things
00:53:03,480 --> 00:53:04,920 are happening all the time.
00:53:04,920 --> 00:53:09,920 You know, there's a, I'm sure some philosopher
00:53:10,320 --> 00:53:11,840 or meta philosophers, you know,
00:53:11,840 --> 00:53:14,040 wondering about how we transform our world.
00:53:16,240 --> 00:53:19,200 So you can't deny the fact that these tools,
00:53:19,200 --> 00:53:24,200 whether these tools are changing our world.
00:53:24,480 --> 00:53:25,400 That's right.
00:53:25,400 --> 00:53:29,080 Do you think it's changing for the better?
00:53:29,080 --> 00:53:31,360 Somebody, I read this thing recently.
00:53:31,360 --> 00:53:35,440 It said the two disciplines with the highest GRE scores
00:53:35,440 --> 00:53:39,760 in college are physics and philosophy, right?
00:53:39,760 --> 00:53:41,840 And they're both sort of trying to answer the question,
00:53:41,840 --> 00:53:44,040 why is there anything, right?
00:53:44,040 --> 00:53:45,600 And the philosophers, you know,
00:53:45,600 --> 00:53:47,800 are on the kind of theological side
00:53:47,800 --> 00:53:51,840 and the physicists are obviously on the material side.
00:53:52,720 --> 00:53:55,040 And there's a hundred billion galaxies
00:53:55,040 --> 00:53:57,000 with a hundred billion stars.
00:53:57,000 --> 00:54:00,200 It seems, well, repetitive at best.
00:54:00,200 --> 00:54:05,200 So, you know, there's on our way to 10 billion people.
00:54:06,280 --> 00:54:08,200 I mean, it's hard to say what it's all for
00:54:08,200 --> 00:54:09,600 if that's what you're asking.
00:54:09,600 --> 00:54:11,320 Yeah, I guess I am.
00:54:11,320 --> 00:54:15,080 Things do tend to significantly increases in complexity.
00:54:16,280 --> 00:54:21,280 And I'm curious about how computation,
00:54:21,320 --> 00:54:23,960 like our world, our physical world
00:54:23,960 --> 00:54:25,920 inherently generates mathematics.
00:54:25,920 --> 00:54:26,880 It's kind of obvious, right?
00:54:26,880 --> 00:54:28,880 So we have X, Y, Z coordinates.
00:54:28,880 --> 00:54:30,160 You take a sphere, you make it bigger.
00:54:30,160 --> 00:54:34,120 You get a surface that grows by R squared.
00:54:34,120 --> 00:54:36,440 Like it generally generates mathematics
00:54:36,440 --> 00:54:38,800 and the mathematicians and the physicists
00:54:38,800 --> 00:54:39,680 have been having a lot of fun
00:54:39,680 --> 00:54:41,320 talking to each other for years.
00:54:41,320 --> 00:54:46,120 And computation has been, let's say, relatively pedestrian.
00:54:46,120 --> 00:54:48,640 Like computation in terms of mathematics
00:54:48,640 --> 00:54:52,080 has been doing binary algebra
00:54:52,080 --> 00:54:54,520 while those guys have been gallivanting
00:54:54,520 --> 00:54:58,080 through the other realms of possibility, right?
00:54:58,080 --> 00:55:01,000 And now recently the computation
00:55:01,000 --> 00:55:04,840 lets you do mathematical computations
00:55:04,840 --> 00:55:06,600 that are sophisticated enough
00:55:06,600 --> 00:55:09,000 that nobody understands how the answers came out.
00:55:09,880 --> 00:55:10,720 Right?
00:55:10,720 --> 00:55:11,560 Machine learning.
00:55:11,560 --> 00:55:12,400 Machine learning.
00:55:12,400 --> 00:55:16,840 It used to be you get data set, you guess at a function.
00:55:16,840 --> 00:55:18,960 The function is considered physics
00:55:18,960 --> 00:55:22,840 if it's predictive of new functions, new data sets.
00:55:22,840 --> 00:55:27,840 Modern, you can take a large data set
00:55:28,440 --> 00:55:30,040 with no intuition about what it is
00:55:30,040 --> 00:55:31,880 and use machine learning to find a pattern
00:55:31,880 --> 00:55:34,320 that has no function, right?
00:55:34,320 --> 00:55:37,600 And it can arrive at results that I don't know
00:55:37,600 --> 00:55:40,040 if they're completely mathematically describable.
00:55:40,040 --> 00:55:44,200 So computation has kind of done something interesting
00:55:44,200 --> 00:55:47,240 compared to A equal B plus C.
00:55:47,240 --> 00:55:49,720 There's something reminiscent of that step
00:55:49,720 --> 00:55:53,720 from the basic operations of addition
00:55:54,840 --> 00:55:56,720 to taking a step towards neural networks
00:55:56,720 --> 00:55:59,040 that's reminiscent of what life on Earth
00:55:59,040 --> 00:56:01,120 at its origins was doing.
00:56:01,120 --> 00:56:03,480 Do you think we're creating sort of the next step
00:56:03,480 --> 00:56:05,640 in our evolution in creating
00:56:05,640 --> 00:56:07,960 artificial intelligence systems that will?
00:56:07,960 --> 00:56:08,800 I don't know.
00:56:08,800 --> 00:56:11,080 I mean, there's so much in the universe already
00:56:11,080 --> 00:56:12,720 it's hard to say.
00:56:12,720 --> 00:56:14,080 Where we stand in this whole thing.
00:56:14,080 --> 00:56:17,480 Are human beings working on additional abstraction layers
00:56:17,480 --> 00:56:18,520 and possibilities?
00:56:18,520 --> 00:56:20,360 Yeah, it appears so.
00:56:20,360 --> 00:56:23,040 Does that mean that human beings don't need dogs?
00:56:23,040 --> 00:56:24,200 You know, no.
00:56:24,200 --> 00:56:27,640 Like there's so many things that are all
00:56:27,640 --> 00:56:30,480 simultaneously interesting and useful.
00:56:30,480 --> 00:56:32,520 Well, you've seen, throughout your career,
00:56:32,520 --> 00:56:35,200 you've seen greater and greater level of abstractions
00:56:35,200 --> 00:56:39,600 built in artificial machines, right?
00:56:39,600 --> 00:56:41,320 Do you think, when you look at humans,
00:56:41,320 --> 00:56:44,080 do you think, look at all life on Earth
00:56:44,080 --> 00:56:46,920 as a single organism building this thing,
00:56:46,920 --> 00:56:49,920 this machine with greater and greater levels of abstraction,
00:56:49,920 --> 00:56:52,720 do you think humans are at the peak,
00:56:52,720 --> 00:56:57,400 the top of the food chain in this long arc of history
00:56:57,400 --> 00:57:00,560 on Earth, or do you think we're just somewhere in the middle?
00:57:00,560 --> 00:57:05,280 Are we the basic functional operations of a CPU?
00:57:05,280 --> 00:57:09,280 Are we the C++ program, the Python program?
00:57:09,280 --> 00:57:10,520 Are we the neural network?
00:57:10,520 --> 00:57:12,920 Like somebody's, you know, people have calculated
00:57:12,920 --> 00:57:14,960 like how many operations does the brain do?
00:57:14,960 --> 00:57:17,720 Something, you know, I've seen the number 10 to the 18th
00:57:17,720 --> 00:57:20,640 a bunch of times, arrived different ways.
00:57:20,640 --> 00:57:22,000 So could you make a computer
00:57:22,000 --> 00:57:23,840 that did 10 to the 20th operations?
00:57:23,840 --> 00:57:25,280 Yes. Sure.
00:57:25,280 --> 00:57:27,080 Do you think? We're gonna do that.
00:57:27,080 --> 00:57:29,440 Now, is there something magical
00:57:29,440 --> 00:57:31,640 about how brains compute things?
00:57:31,640 --> 00:57:33,000 I don't know.
00:57:33,000 --> 00:57:35,280 You know, my personal experience is interesting
00:57:35,280 --> 00:57:37,800 because, you know, you think you know how you think
00:57:37,800 --> 00:57:39,040 and then you have all these ideas
00:57:39,040 --> 00:57:41,520 and you can't figure out how they happened.
00:57:41,520 --> 00:57:45,920 And if you meditate, you know, like what,
00:57:45,920 --> 00:57:48,680 what you can be aware of is interesting.
00:57:48,680 --> 00:57:51,760 So I don't know if brains are magical or not.
00:57:51,760 --> 00:57:54,800 You know, the physical evidence says no.
00:57:54,800 --> 00:57:57,880 Lots of people's personal experience says yes.
00:57:57,880 --> 00:58:01,320 So what would be funny is if brains are magical
00:58:01,320 --> 00:58:04,640 and yet we can make brains with more computation.
00:58:04,640 --> 00:58:07,080 You know, I don't know what to say about that, but.
00:58:07,080 --> 00:58:10,480 Well, do you think magic is an emergent phenomena?
00:58:10,480 --> 00:58:13,880 Would be, I have no explanation for it.
00:58:13,880 --> 00:58:17,760 Let me ask Jim Keller of what in your view is consciousness?
00:58:19,280 --> 00:58:20,640 With consciousness?
00:58:20,640 --> 00:58:25,520 Yeah, like what, you know, consciousness, love,
00:58:25,520 --> 00:58:27,720 things that are these deeply human things
00:58:27,720 --> 00:58:29,600 that seems to emerge from our brain.
00:58:29,600 --> 00:58:34,440 Is that something that we'll be able to make in code
00:58:34,440 --> 00:58:38,200 in chips that get faster and faster and faster and faster?
00:58:38,200 --> 00:58:39,920 That's like a 10 hour conversation.
00:58:39,920 --> 00:58:41,040 No, nobody really knows.
00:58:41,040 --> 00:58:45,320 Can you summarize it in a couple of sentences?
00:58:45,320 --> 00:58:48,880 Many people have observed that organisms run
00:58:48,880 --> 00:58:51,520 at lots of different levels, right?
00:58:51,520 --> 00:58:52,880 If you had two neurons, somebody said
00:58:52,880 --> 00:58:56,920 you'd have one sensory neuron and one motor neuron, right?
00:58:56,920 --> 00:58:58,840 So we move towards things and away from things
00:58:58,840 --> 00:59:03,200 and we have physical integrity and safety or not, right?
00:59:03,200 --> 00:59:05,720 And then if you look at the animal kingdom,
00:59:05,720 --> 00:59:08,360 you can see brains that are a little more complicated.
00:59:08,360 --> 00:59:10,320 And at some point there's a planning system
00:59:10,320 --> 00:59:12,000 and then there's an emotional system
00:59:12,000 --> 00:59:14,400 that's, you know, happy about being safe
00:59:14,400 --> 00:59:17,240 or unhappy about being threatened, right?
00:59:17,240 --> 00:59:21,680 And then our brains have massive numbers of structures,
00:59:21,680 --> 00:59:24,960 you know, like planning and movement and thinking
00:59:24,960 --> 00:59:27,960 and feeling and drives and emotions.
00:59:27,960 --> 00:59:31,160 And we seem to have multiple layers of thinking systems.
00:59:31,160 --> 00:59:32,840 And we have a brain, a dream system
00:59:32,840 --> 00:59:35,280 that nobody understands whatsoever,
00:59:35,280 --> 00:59:37,520 which I find completely hilarious.
00:59:37,520 --> 00:59:41,520 And you can think in a way
00:59:41,520 --> 00:59:45,680 that those systems are more independent
00:59:45,680 --> 00:59:46,840 and you can observe, you know,
00:59:46,840 --> 00:59:49,560 the different parts of yourself can observe them.
00:59:49,560 --> 00:59:51,400 I don't know which one's magical.
00:59:51,400 --> 00:59:53,620 I don't know which one's not computational.
00:59:55,360 --> 00:59:56,800 So.
00:59:56,800 --> 00:59:58,920 Is it possible that it's all computation?
00:59:58,920 --> 01:00:00,120 Probably.
01:00:00,120 --> 01:00:01,560 Is there a limit to computation?
01:00:01,560 --> 01:00:03,200 I don't think so.
01:00:03,200 --> 01:00:05,240 Do you think the universe is a computer?
01:00:05,240 --> 01:00:07,440 I don't know, it seems to be.
01:00:07,440 --> 01:00:09,600 It's a weird kind of computer
01:00:09,600 --> 01:00:12,600 because if it was a computer, right,
01:00:12,600 --> 01:00:15,360 like when they do calculations on what it,
01:00:15,360 --> 01:00:17,600 how much calculation it takes to describe
01:00:17,600 --> 01:00:20,960 quantum effects is unbelievably high.
01:00:20,960 --> 01:00:22,200 So if it was a computer,
01:00:22,200 --> 01:00:23,560 wouldn't you have built it out of something
01:00:23,560 --> 01:00:25,080 that was easier to compute?
01:00:26,040 --> 01:00:29,600 Right, that's a funny, it's a funny system.
01:00:29,600 --> 01:00:31,360 But then the simulation guys have pointed out
01:00:31,360 --> 01:00:32,740 that the rules are kind of interesting.
01:00:32,740 --> 01:00:35,120 Like when you look really close, it's uncertain.
01:00:35,120 --> 01:00:37,720 And the speed of light says you can only look so far.
01:00:37,720 --> 01:00:39,200 And things can't be simultaneous
01:00:39,200 --> 01:00:41,280 except for the odd entanglement problem
01:00:41,280 --> 01:00:42,600 where they seem to be.
01:00:42,600 --> 01:00:45,120 Like the rules are all kind of weird.
01:00:45,120 --> 01:00:48,880 And somebody said physics is like having 50 equations
01:00:48,880 --> 01:00:52,080 with 50 variables to define 50 variables.
01:00:52,080 --> 01:00:55,280 Like, you know, it's, you know,
01:00:55,280 --> 01:00:57,000 like physics itself has been a shit show
01:00:57,000 --> 01:00:59,060 for thousands of years.
01:00:59,060 --> 01:01:01,800 It seems odd when you get to the corners of everything,
01:01:01,800 --> 01:01:03,720 you know, it's either uncomputable
01:01:03,720 --> 01:01:07,240 or undefinable or uncertain.
01:01:07,240 --> 01:01:09,400 It's almost like the designers of the simulation
01:01:09,400 --> 01:01:12,880 are trying to prevent us from understanding it perfectly.
01:01:12,880 --> 01:01:16,200 But also the things that require calculations
01:01:16,200 --> 01:01:18,560 require so much calculation that our idea
01:01:18,560 --> 01:01:20,880 of the universe of a computer is absurd
01:01:20,880 --> 01:01:23,160 because every single little bit of it
01:01:23,160 --> 01:01:26,720 takes all the computation in the universe to figure out.
01:01:26,720 --> 01:01:28,160 So that's a weird kind of computer.
01:01:28,160 --> 01:01:30,960 You know, you say the simulation is running in the computer
01:01:30,960 --> 01:01:34,600 which has, by definition, infinite computation.
01:01:34,600 --> 01:01:35,520 Not infinite.
01:01:35,520 --> 01:01:37,760 Oh, you mean if the universe is infinite.
01:01:37,760 --> 01:01:40,800 Yeah, well, every little piece of our universe
01:01:40,800 --> 01:01:43,320 seems to take infinite computation to figure out.
01:01:43,320 --> 01:01:44,300 Not infinite, just a lot.
01:01:44,300 --> 01:01:46,120 Well, a lot, some pretty big number.
01:01:46,120 --> 01:01:50,400 Compute this little teeny spot takes all the mass
01:01:50,400 --> 01:01:53,520 in the local one light year by one light year space.
01:01:53,520 --> 01:01:55,000 It's close enough to infinite.
01:01:55,000 --> 01:01:56,720 So it's a heck of a computer if it is one.
01:01:56,720 --> 01:02:00,080 I know, it's a weird description
01:02:00,080 --> 01:02:03,200 because the simulation description seems to break
01:02:03,200 --> 01:02:05,040 when you look closely at it.
01:02:05,040 --> 01:02:06,080 But the rules of the universe
01:02:06,080 --> 01:02:07,820 seem to imply something's up.
01:02:08,920 --> 01:02:11,000 That seems a little arbitrary.
01:02:11,000 --> 01:02:15,040 The universe, the whole thing, the laws of physics,
01:02:15,040 --> 01:02:20,040 it just seems like how did it come out to be the way it is?
01:02:20,240 --> 01:02:22,740 Well, lots of people talk about that.
01:02:22,740 --> 01:02:24,560 Like I said, the two smartest groups of humans
01:02:24,560 --> 01:02:27,840 are working on the same problem from different aspects
01:02:27,840 --> 01:02:30,120 and they're both complete failures.
01:02:30,120 --> 01:02:31,560 So that's kind of cool.
01:02:32,760 --> 01:02:34,280 They might succeed eventually.
01:02:35,400 --> 01:02:38,360 Well, after 2,000 years, the trend isn't good.
01:02:38,360 --> 01:02:40,240 2,000 years is nothing in the span
01:02:40,240 --> 01:02:41,600 of the history of the universe.
01:02:41,600 --> 01:02:43,440 So we have some time.
01:02:43,440 --> 01:02:45,840 But the next 1,000 years doesn't look good either.
01:02:47,340 --> 01:02:49,000 That's what everybody says at every stage.
01:02:49,000 --> 01:02:51,480 But with Moore's Law, as you've just described,
01:02:51,480 --> 01:02:55,320 not being dead, the exponential growth of technology,
01:02:55,320 --> 01:02:57,800 the future seems pretty incredible.
01:02:57,800 --> 01:02:59,680 Well, it'll be interesting, that's for sure.
01:02:59,680 --> 01:03:00,520 That's right.
01:03:00,520 --> 01:03:03,920 So what are your thoughts on Ray Kurzweil's sense
01:03:03,920 --> 01:03:05,960 that exponential improvement in technology
01:03:05,960 --> 01:03:07,680 will continue indefinitely?
01:03:07,680 --> 01:03:10,920 Is that how you see Moore's Law?
01:03:10,920 --> 01:03:13,160 Do you see Moore's Law more broadly
01:03:13,160 --> 01:03:16,980 in the sense that technology of all kinds
01:03:16,980 --> 01:03:21,320 has a way of stacking S curves on top of each other
01:03:21,320 --> 01:03:24,600 where it'll be exponential and then we'll see all kinds of.
01:03:24,600 --> 01:03:27,720 What does an exponential of a million mean?
01:03:27,720 --> 01:03:29,480 That's a pretty amazing number.
01:03:29,480 --> 01:03:32,240 And that's just for a local little piece of silicon.
01:03:32,240 --> 01:03:35,800 Now let's imagine you, say, decided to get
01:03:37,020 --> 01:03:41,560 1,000 tons of silicon to collaborate in one computer
01:03:41,560 --> 01:03:43,220 at a million times the density.
01:03:44,400 --> 01:03:46,880 Like now you're talking, I don't know,
01:03:46,880 --> 01:03:49,840 10 to the 20th more computation power
01:03:49,840 --> 01:03:52,880 than our current already unbelievably fast computers.
01:03:54,080 --> 01:03:55,840 Like nobody knows what that's gonna mean.
01:03:55,840 --> 01:03:59,000 The sci-fi guys call it computronium.
01:03:59,000 --> 01:04:01,320 Like when a local civilization
01:04:01,320 --> 01:04:03,900 turns the nearby star into a computer.
01:04:04,800 --> 01:04:06,760 Like, I don't know if that's true.
01:04:06,760 --> 01:04:11,600 So just even when you shrink a transistor, the.
01:04:11,600 --> 01:04:12,640 That's only one dimension.
01:04:12,640 --> 01:04:14,160 The ripple effects of that.
01:04:14,160 --> 01:04:16,000 Like people tend to think about computers
01:04:16,000 --> 01:04:17,680 as a cost problem, right?
01:04:17,680 --> 01:04:19,400 So computers are made out of silicon
01:04:19,400 --> 01:04:24,400 and minor amounts of metals and this and that.
01:04:24,400 --> 01:04:26,960 None of those things cost any money.
01:04:26,960 --> 01:04:28,760 Like there's plenty of sand.
01:04:28,760 --> 01:04:31,160 Like you could just turn the beach
01:04:31,160 --> 01:04:33,380 and a little bit of ocean water into computers.
01:04:33,380 --> 01:04:36,740 So all the cost is in the equipment to do it.
01:04:36,740 --> 01:04:38,840 And the trend on equipment is
01:04:38,840 --> 01:04:40,640 once you figure out how to build the equipment,
01:04:40,640 --> 01:04:41,840 the trend of cost is zero.
01:04:41,840 --> 01:04:45,960 Elon said, first you figure out what configuration
01:04:45,960 --> 01:04:50,320 you want the atoms in and then how to put them there, right?
01:04:50,320 --> 01:04:51,160 Yeah.
01:04:51,160 --> 01:04:54,960 Well, here's the, you know, his great insight is
01:04:54,960 --> 01:04:56,560 people are how constrained.
01:04:56,560 --> 01:04:58,760 I have this thing, I know how it works.
01:04:58,760 --> 01:05:02,380 And then little tweaks to that will generate something
01:05:02,380 --> 01:05:05,240 as opposed to what do I actually want
01:05:05,240 --> 01:05:07,120 and then figure out how to build it.
01:05:07,120 --> 01:05:09,360 It's a very different mindset
01:05:09,360 --> 01:05:11,440 and almost nobody has it, obviously.
01:05:12,920 --> 01:05:15,840 Well, let me ask on that topic.
01:05:15,840 --> 01:05:18,160 You were one of the key early people
01:05:18,160 --> 01:05:20,240 in the development of autopilot,
01:05:20,240 --> 01:05:22,560 at least in the hardware side.
01:05:22,560 --> 01:05:25,520 Elon Musk believes that autopilot and vehicle autonomy,
01:05:25,520 --> 01:05:26,760 if you just look at that problem,
01:05:26,760 --> 01:05:29,520 can follow this kind of exponential improvement.
01:05:29,520 --> 01:05:32,640 In terms of the how question that we're talking about,
01:05:32,640 --> 01:05:34,740 there's no reason why you can't.
01:05:34,740 --> 01:05:37,360 What are your thoughts on this particular space
01:05:37,360 --> 01:05:42,360 of vehicle autonomy and your part of it
01:05:42,360 --> 01:05:45,320 and Elon Musk's and Tesla's vision for vehicle autonomy?
01:05:45,320 --> 01:05:48,800 Well, the computer you need to build was straightforward.
01:05:48,800 --> 01:05:51,180 And you could argue, well, does it need to be
01:05:51,180 --> 01:05:53,640 two times faster or five times or 10 times?
01:05:54,580 --> 01:05:58,480 But that's just a matter of time or price in the short run.
01:05:58,480 --> 01:06:00,280 So that's not a big deal.
01:06:00,280 --> 01:06:03,320 You don't have to be especially smart to drive a car.
01:06:03,320 --> 01:06:05,760 So it's not like a super hard problem.
01:06:05,760 --> 01:06:07,980 I mean, the big problem with safety is attention,
01:06:07,980 --> 01:06:11,160 which computers are really good at, not skills.
01:06:12,920 --> 01:06:15,280 Well, let me push back on one.
01:06:15,280 --> 01:06:17,200 You see, everything you said is correct,
01:06:17,200 --> 01:06:22,200 but we as humans tend to take for granted
01:06:24,320 --> 01:06:27,960 how incredible our vision system is, so.
01:06:27,960 --> 01:06:30,680 You can drive a car with 20-50 vision
01:06:30,680 --> 01:06:32,320 and you can train a neural network
01:06:32,320 --> 01:06:34,760 to extract the distance of any object
01:06:34,760 --> 01:06:38,620 and the shape of any surface from a video and data.
01:06:38,620 --> 01:06:40,240 Yeah, but that's. It's really simple.
01:06:40,240 --> 01:06:42,200 No, it's not simple.
01:06:42,200 --> 01:06:44,480 That's a simple data problem.
01:06:44,480 --> 01:06:46,400 It's not, it's not simple.
01:06:46,400 --> 01:06:50,520 It's because it's not just detecting objects.
01:06:50,520 --> 01:06:52,320 It's understanding the scene
01:06:52,320 --> 01:06:54,360 and it's being able to do it in a way
01:06:54,360 --> 01:06:56,640 that doesn't make errors.
01:06:56,640 --> 01:07:00,060 So the beautiful thing about the human vision system
01:07:00,060 --> 01:07:02,640 and our entire brain around the whole thing
01:07:02,640 --> 01:07:05,560 is we're able to fill in the gaps.
01:07:05,560 --> 01:07:08,240 It's not just about perfectly detecting cars.
01:07:08,240 --> 01:07:10,000 It's inferring the occluded cars.
01:07:10,000 --> 01:07:12,440 It's trying to, it's understanding the physics.
01:07:12,440 --> 01:07:14,600 I think that's mostly a data problem.
01:07:14,600 --> 01:07:17,720 So you think what data would compute
01:07:17,720 --> 01:07:19,240 with improvement of computation,
01:07:19,240 --> 01:07:20,800 with improvement in collection.
01:07:20,800 --> 01:07:22,680 Well, there's a, you know, when you're driving a car
01:07:22,680 --> 01:07:23,660 and somebody cuts you off,
01:07:23,660 --> 01:07:26,180 your brain has theories about why they did it.
01:07:26,180 --> 01:07:27,560 You know, they're a bad person,
01:07:27,560 --> 01:07:29,960 they're distracted, they're dumb.
01:07:29,960 --> 01:07:32,860 You know, you can listen to yourself, right?
01:07:32,860 --> 01:07:37,080 So, you know, if you think that narrative is important
01:07:37,080 --> 01:07:38,880 to be able to successfully drive a car,
01:07:38,880 --> 01:07:41,680 then current autopilot systems can't do it.
01:07:41,680 --> 01:07:43,800 But if cars are ballistic things
01:07:43,800 --> 01:07:45,800 with tracks and probabilistic changes
01:07:45,800 --> 01:07:47,360 of speed and direction,
01:07:47,360 --> 01:07:50,240 and roads are fixed and given, by the way,
01:07:50,240 --> 01:07:53,320 they don't change dynamically, right?
01:07:53,320 --> 01:07:56,360 You can map the world really thoroughly.
01:07:56,360 --> 01:08:00,360 You can place every object really thoroughly, right?
01:08:01,560 --> 01:08:04,820 You can calculate trajectories of things really thoroughly.
01:08:06,120 --> 01:08:06,960 Right?
01:08:06,960 --> 01:08:09,880 But everything you said about really thoroughly
01:08:09,880 --> 01:08:12,560 has a different degree of difficulty.
01:08:12,560 --> 01:08:15,120 And you could say, at some point,
01:08:15,120 --> 01:08:17,680 computer autonomous systems will be way better
01:08:17,680 --> 01:08:20,080 at things that humans are lousy at.
01:08:20,080 --> 01:08:22,520 Like, they'll be better at attention,
01:08:22,520 --> 01:08:25,080 they'll always remember there was a pothole in the road
01:08:25,080 --> 01:08:27,400 that humans keep forgetting about,
01:08:27,400 --> 01:08:29,480 they'll remember that this set of roads
01:08:29,480 --> 01:08:31,240 has these weirdo lines on it
01:08:31,240 --> 01:08:32,800 that the computers figured out once,
01:08:32,800 --> 01:08:35,200 and especially if they get updates
01:08:35,200 --> 01:08:38,000 so that somebody changes a given.
01:08:38,000 --> 01:08:40,680 Like, the key to robots and stuff,
01:08:40,680 --> 01:08:44,400 somebody said, is to maximize the givens, right?
01:08:44,400 --> 01:08:45,240 Right.
01:08:45,240 --> 01:08:48,000 Right, so having a robot pick up this bottle cap
01:08:48,000 --> 01:08:51,040 is way easier if you put a red dot on the top,
01:08:51,040 --> 01:08:52,720 because then you'll have to figure out,
01:08:52,720 --> 01:08:54,880 and if you wanna do a certain thing with it,
01:08:54,880 --> 01:08:57,200 maximize the givens is the thing.
01:08:57,200 --> 01:09:00,280 And autonomous systems are happily maximizing the givens.
01:09:01,120 --> 01:09:04,200 Like, humans, when you drive someplace new,
01:09:04,200 --> 01:09:06,960 you remember it because you're processing it the whole time,
01:09:06,960 --> 01:09:08,960 and after the 50th time you drove to work,
01:09:08,960 --> 01:09:11,480 you get to work, you don't know how you got there, right?
01:09:11,480 --> 01:09:14,840 You're on autopilot, right?
01:09:14,840 --> 01:09:17,800 Autonomous cars are always on autopilot.
01:09:17,800 --> 01:09:20,400 But the cars have no theories about why they got cut off
01:09:20,400 --> 01:09:22,160 or why they're in traffic.
01:09:22,160 --> 01:09:24,720 So they also never stop paying attention.
01:09:24,720 --> 01:09:28,000 Right, so I tend to believe you do have to have theories,
01:09:28,000 --> 01:09:30,000 mental models of other people,
01:09:30,000 --> 01:09:31,440 especially with pedestrian cyclists,
01:09:31,440 --> 01:09:32,840 but also with other cars.
01:09:32,840 --> 01:09:37,840 So everything you said is actually essential
01:09:37,840 --> 01:09:38,920 to driving.
01:09:38,920 --> 01:09:41,760 Driving is a lot more complicated than people realize,
01:09:41,760 --> 01:09:43,880 I think, so sort of to push back slightly.
01:09:43,880 --> 01:09:44,720 But to-
01:09:44,720 --> 01:09:47,080 So to cut into traffic, right?
01:09:47,080 --> 01:09:48,480 You can't just wait for a gap.
01:09:48,480 --> 01:09:50,280 You have to be somewhat aggressive.
01:09:50,280 --> 01:09:53,840 You'll be surprised how simple a calculation for that is.
01:09:53,840 --> 01:09:55,520 I may be on that particular point,
01:09:55,520 --> 01:10:00,360 but there's, maybe I actually have to push back.
01:10:00,360 --> 01:10:01,640 I would be surprised.
01:10:01,640 --> 01:10:03,080 You know what, yeah, I'll just say where I stand.
01:10:03,080 --> 01:10:06,240 I would be very surprised, but I think it's,
01:10:06,240 --> 01:10:09,240 you might be surprised how complicated it is.
01:10:09,240 --> 01:10:10,080 That-
01:10:10,080 --> 01:10:12,680 I tell people, progress disappoints in the short run,
01:10:12,680 --> 01:10:14,000 surprises in the long run.
01:10:14,000 --> 01:10:15,640 It's very possible, yeah.
01:10:15,640 --> 01:10:19,040 I suspect in 10 years, it'll be just taken for granted.
01:10:19,040 --> 01:10:19,920 Yeah, probably.
01:10:19,920 --> 01:10:22,120 But you're probably right, not look like-
01:10:22,120 --> 01:10:25,120 It's gonna be a $50 solution that nobody cares about.
01:10:25,120 --> 01:10:27,320 It's like GPS is like, wow, GPS is,
01:10:27,320 --> 01:10:30,120 we have satellites in space that tell you
01:10:30,120 --> 01:10:30,960 where your location is.
01:10:30,960 --> 01:10:33,560 It was a really big deal, and now everything has a GPS in it.
01:10:33,560 --> 01:10:34,400 Yeah, that's true.
01:10:34,400 --> 01:10:38,920 I do think that systems that involve human behavior
01:10:38,920 --> 01:10:40,840 are more complicated than we give them credit for.
01:10:40,840 --> 01:10:43,560 So we can do incredible things with technology
01:10:43,560 --> 01:10:45,040 that don't involve humans.
01:10:45,040 --> 01:10:45,880 But when you-
01:10:45,880 --> 01:10:48,440 I think humans are less complicated than people,
01:10:48,440 --> 01:10:50,600 you know, frequently inscribed.
01:10:50,600 --> 01:10:51,440 Maybe I feel-
01:10:51,440 --> 01:10:53,760 We tend to operate out of large numbers of patterns
01:10:53,760 --> 01:10:55,840 and just keep doing it over and over.
01:10:55,840 --> 01:10:58,080 But I can't trust you because you're a human.
01:10:58,080 --> 01:11:00,800 That's something a human would say.
01:11:00,800 --> 01:11:04,640 But my hope is on the point you've made is,
01:11:04,640 --> 01:11:07,320 even if, no matter who's right,
01:11:08,880 --> 01:11:10,680 I'm hoping that there's a lot of things
01:11:10,680 --> 01:11:11,920 that humans aren't good at
01:11:11,920 --> 01:11:13,480 that machines are definitely good at,
01:11:13,480 --> 01:11:15,680 like you said, attention and things like that.
01:11:15,680 --> 01:11:17,720 Well, they'll be so much better
01:11:17,720 --> 01:11:21,040 that the overall picture of safety and autonomy
01:11:21,040 --> 01:11:22,920 will be, obviously, cars will be safer,
01:11:22,920 --> 01:11:24,760 even if they're not as good at understanding-
01:11:24,760 --> 01:11:26,440 I'm a big believer in safety.
01:11:26,440 --> 01:11:29,660 I mean, there are already, the current safety systems,
01:11:29,660 --> 01:11:32,080 like cruise control that doesn't let you run into people
01:11:32,080 --> 01:11:33,400 and lane keeping.
01:11:33,400 --> 01:11:34,720 There are so many features
01:11:34,720 --> 01:11:37,800 that you just look at the parade of accidents
01:11:37,800 --> 01:11:40,800 and knocking off like 80% of them is, you know,
01:11:40,800 --> 01:11:42,500 super doable.
01:11:42,500 --> 01:11:44,720 Just to linger on the autopilot team
01:11:44,720 --> 01:11:45,920 and the efforts there,
01:11:48,040 --> 01:11:51,740 it seems to be that there's a very intense scrutiny
01:11:51,740 --> 01:11:54,360 by the media and the public in terms of safety,
01:11:54,360 --> 01:11:58,040 the pressure, the bar put before autonomous vehicles.
01:11:58,040 --> 01:12:01,780 What are your, sort of as a person there
01:12:01,780 --> 01:12:03,920 working on the hardware and trying to build a system
01:12:03,920 --> 01:12:07,280 that builds a safe vehicle and so on,
01:12:07,280 --> 01:12:09,000 what was your sense about that pressure?
01:12:09,000 --> 01:12:09,960 Is it unfair?
01:12:09,960 --> 01:12:12,360 Is it expected of new technology?
01:12:12,360 --> 01:12:13,560 Yeah, it seems reasonable.
01:12:13,560 --> 01:12:15,480 I was interested, I talked to both American
01:12:15,480 --> 01:12:17,320 and European regulators,
01:12:17,320 --> 01:12:21,280 and I was worried that the regulations
01:12:21,280 --> 01:12:25,160 would write into the rules technology solutions,
01:12:25,160 --> 01:12:30,080 like modern brake systems imply hydraulic brakes.
01:12:30,080 --> 01:12:32,200 So if you read the regulations,
01:12:32,200 --> 01:12:35,120 to meet the letter of the law for brakes,
01:12:35,120 --> 01:12:37,840 it sort of has to be hydraulic, right?
01:12:37,840 --> 01:12:42,100 And the regulator said they're interested in the use cases,
01:12:42,100 --> 01:12:44,400 like a head-on crash, an offset crash,
01:12:44,400 --> 01:12:47,120 don't hit pedestrians, don't run into people,
01:12:47,120 --> 01:12:50,440 don't leave the road, don't run a red light or a stoplight.
01:12:50,440 --> 01:12:53,200 They were very much into the scenarios.
01:12:53,200 --> 01:12:56,960 And they had all the data about which scenarios
01:12:56,960 --> 01:12:59,360 injured or killed the most people.
01:12:59,360 --> 01:13:04,080 And for the most part, those conversations were like,
01:13:04,080 --> 01:13:08,840 what's the right thing to do to take the next step?
01:13:08,840 --> 01:13:12,040 Now, Elon's very interested also in the benefits
01:13:12,040 --> 01:13:14,200 of autonomous driving or freeing people's time
01:13:14,200 --> 01:13:16,560 and attention, as well as safety.
01:13:18,640 --> 01:13:20,380 And I think that's also an interesting thing,
01:13:20,380 --> 01:13:25,200 but building autonomous systems so they're safe
01:13:25,200 --> 01:13:27,440 and safer than people seemed,
01:13:27,440 --> 01:13:30,200 since the goal is to be 10X safer than people,
01:13:30,200 --> 01:13:32,240 having the bar to be safer than people
01:13:32,240 --> 01:13:37,240 and scrutinizing accidents seems philosophically correct.
01:13:39,280 --> 01:13:41,040 So I think that's a good thing.
01:13:41,040 --> 01:13:46,040 What are, is different than the things you worked at,
01:13:46,040 --> 01:13:51,040 Intel, AMD, Apple, with autopilot chip design
01:13:51,640 --> 01:13:54,360 and hardware design, what are interesting
01:13:54,360 --> 01:13:56,720 or challenging aspects of building this specialized
01:13:56,720 --> 01:13:59,360 kind of computing system in the automotive space?
01:14:00,340 --> 01:14:01,660 I mean, there's two tricks to building
01:14:01,660 --> 01:14:02,800 like an automotive computer.
01:14:02,800 --> 01:14:07,360 One is the software team, the machine learning team
01:14:07,360 --> 01:14:10,680 is developing algorithms that are changing fast.
01:14:10,680 --> 01:14:14,320 So as you're building the accelerator,
01:14:14,320 --> 01:14:16,960 you have this, you know, worry or intuition
01:14:16,960 --> 01:14:18,560 that the algorithms will change enough
01:14:18,560 --> 01:14:22,680 that the accelerator will be the wrong one, right?
01:14:22,680 --> 01:14:25,040 And there's the generic thing, which is,
01:14:25,040 --> 01:14:27,280 if you build a really good general purpose computer,
01:14:27,280 --> 01:14:31,480 say its performance is one, and then GPU guys
01:14:31,480 --> 01:14:34,320 will deliver about five X to performance
01:14:34,320 --> 01:14:35,760 for the same amount of silicon,
01:14:35,760 --> 01:14:37,660 because instead of discovering parallelism,
01:14:37,660 --> 01:14:39,280 you're given parallelism.
01:14:39,280 --> 01:14:43,760 And then special accelerators get another two to five X
01:14:43,760 --> 01:14:46,080 on top of a GPU, because you say,
01:14:46,080 --> 01:14:49,080 I know the math is always eight bit integers
01:14:49,080 --> 01:14:52,240 into 32 bit accumulators, and the operations
01:14:52,240 --> 01:14:55,240 are the subset of mathematical possibilities.
01:14:55,240 --> 01:15:00,000 So, you know, AI accelerators have a claimed
01:15:00,000 --> 01:15:01,800 performance benefit over GPUs,
01:15:01,800 --> 01:15:05,120 because in the narrow mass space,
01:15:05,120 --> 01:15:07,160 you're nailing the algorithm.
01:15:07,160 --> 01:15:10,080 Now, you still try to make it programmable,
01:15:10,080 --> 01:15:13,320 but the AI field is changing really fast.
01:15:13,320 --> 01:15:17,360 So there's a little creative tension there of,
01:15:17,360 --> 01:15:20,660 I want the acceleration afforded by specialization
01:15:20,660 --> 01:15:22,200 without being over specialized,
01:15:22,200 --> 01:15:25,640 so that the new algorithm is so much more effective
01:15:25,640 --> 01:15:28,000 that you'd have been better off on a GPU.
01:15:28,000 --> 01:15:30,040 So there's a tension there.
01:15:30,040 --> 01:15:33,040 To build a good computer for an application
01:15:33,040 --> 01:15:36,280 like automotive, there's all kinds of sensor inputs
01:15:36,280 --> 01:15:39,160 and safety processors and a bunch of stuff.
01:15:39,160 --> 01:15:42,280 So one of Elon's goals is to make it super affordable.
01:15:42,280 --> 01:15:44,880 So every car gets an autopilot computer.
01:15:44,880 --> 01:15:46,560 So some of the recent startups you look at,
01:15:46,560 --> 01:15:48,400 and they have a server in the trunk,
01:15:48,400 --> 01:15:49,720 because they're saying, I'm gonna build
01:15:49,720 --> 01:15:52,560 this autopilot computer, replaces the driver.
01:15:52,560 --> 01:15:55,280 So their cost budget's 10 or $20,000.
01:15:55,280 --> 01:15:58,800 And Elon's constraint was, I'm gonna put one in every car,
01:15:58,800 --> 01:16:01,760 whether people buy autonomous driving or not.
01:16:01,760 --> 01:16:05,300 So the cost constraint he had in mind was great, right?
01:16:05,300 --> 01:16:08,440 And to hit that, you had to think about the system design.
01:16:08,440 --> 01:16:09,920 That's complicated, and it's fun.
01:16:09,920 --> 01:16:12,600 You know, it's like, it's like, it's craftsman's work.
01:16:12,600 --> 01:16:14,280 Like, you know, a violin maker, right?
01:16:14,280 --> 01:16:16,800 You can say, Stradivarius is this incredible thing,
01:16:16,800 --> 01:16:18,520 the musicians are incredible.
01:16:18,520 --> 01:16:20,520 But the guy making the violin, you know,
01:16:20,520 --> 01:16:24,040 picked wood and sanded it, and then he cut it,
01:16:24,040 --> 01:16:26,000 you know, and he glued it, you know,
01:16:26,000 --> 01:16:27,960 and he waited for the right day
01:16:27,960 --> 01:16:29,560 so that when he put the finish on it,
01:16:29,560 --> 01:16:31,680 it didn't, you know, do something dumb.
01:16:31,680 --> 01:16:33,920 That's craftsman's work, right?
01:16:33,920 --> 01:16:35,560 You may be a genius craftsman
01:16:35,560 --> 01:16:36,880 because you have the best techniques
01:16:36,880 --> 01:16:38,880 and you discover a new one,
01:16:38,880 --> 01:16:41,980 but most engineers, craftsman's work.
01:16:41,980 --> 01:16:44,320 And humans really like to do that.
01:16:44,320 --> 01:16:45,160 You know the expression?
01:16:45,160 --> 01:16:46,000 Smart humans.
01:16:46,000 --> 01:16:46,840 No, everybody.
01:16:46,840 --> 01:16:47,680 All humans.
01:16:47,680 --> 01:16:50,400 I don't know, I used to, I dug ditches when I was in college.
01:16:50,400 --> 01:16:51,480 I got really good at it.
01:16:51,480 --> 01:16:52,660 Satisfying.
01:16:52,660 --> 01:16:53,500 Yeah.
01:16:53,500 --> 01:16:54,320 So.
01:16:54,320 --> 01:16:55,480 Digging ditches is also craftsman's work.
01:16:55,480 --> 01:16:57,000 Yeah, of course.
01:16:57,000 --> 01:17:00,940 So there's an expression called complex mastery behavior.
01:17:00,940 --> 01:17:02,560 So when you're learning something, that's fun,
01:17:02,560 --> 01:17:04,120 because you're learning something.
01:17:04,120 --> 01:17:05,800 When you do something, it's relatively simple.
01:17:05,800 --> 01:17:06,740 It's not that satisfying.
01:17:06,740 --> 01:17:10,400 But if the steps that you have to do are complicated
01:17:10,400 --> 01:17:13,540 and you're good at them, it's satisfying to do them.
01:17:14,640 --> 01:17:16,880 And then if you're intrigued by it all,
01:17:16,880 --> 01:17:19,560 as you're doing them, you sometimes learn new things
01:17:19,560 --> 01:17:21,640 that you can raise your game,
01:17:21,640 --> 01:17:23,800 but craftsman's work is good.
01:17:23,800 --> 01:17:27,120 And engineers, like engineering is complicated enough
01:17:27,120 --> 01:17:28,840 that you have to learn a lot of skills,
01:17:28,840 --> 01:17:32,400 and then a lot of what you do is then craftsman's work,
01:17:32,400 --> 01:17:33,520 which is fun.
01:17:33,520 --> 01:17:37,520 Autonomous driving, building a very resource-constrained
01:17:37,520 --> 01:17:39,560 computer, so a computer has to be cheap enough
01:17:39,560 --> 01:17:41,160 to put in every single car.
01:17:41,160 --> 01:17:45,120 That essentially boils down to craftsman's work.
01:17:45,120 --> 01:17:46,960 It's engineering, it's innovation.
01:17:46,960 --> 01:17:47,780 You know, there's thoughtful decisions
01:17:47,780 --> 01:17:50,640 and problems to solve and trade-offs to make.
01:17:50,640 --> 01:17:52,560 Do you need 10 camera inputs or eight?
01:17:52,560 --> 01:17:54,560 You know, are you building for the current car
01:17:54,560 --> 01:17:56,080 or the next one?
01:17:56,080 --> 01:17:57,960 You know, how do you do the safety stuff?
01:17:57,960 --> 01:18:00,680 You know, there's a whole bunch of details.
01:18:00,680 --> 01:18:03,940 But it's fun, but it's not like I'm building a new type
01:18:03,940 --> 01:18:06,080 of neural network which has a new mathematics
01:18:06,080 --> 01:18:08,080 and a new computer to work.
01:18:08,080 --> 01:18:11,540 You know, that's, like there's more invention than that.
01:18:12,440 --> 01:18:14,160 But the reduction to practice,
01:18:14,160 --> 01:18:16,160 once you pick the architecture, you look inside
01:18:16,160 --> 01:18:17,120 and what do you see?
01:18:17,120 --> 01:18:20,400 Adders and multipliers and memories and, you know,
01:18:20,400 --> 01:18:21,240 the basics.
01:18:21,240 --> 01:18:25,660 So computers is always this weird set of abstraction layers
01:18:25,660 --> 01:18:29,380 of ideas and thinking that reduction to practice
01:18:29,380 --> 01:18:33,820 is transistors and wires and, you know, pretty basic stuff.
01:18:33,820 --> 01:18:37,120 And that's an interesting phenomena.
01:18:37,120 --> 01:18:40,080 By the way, like factory work, like lots of people think
01:18:40,080 --> 01:18:42,320 factory work is road assembly stuff.
01:18:42,320 --> 01:18:44,200 I've been on the assembly line.
01:18:44,200 --> 01:18:46,320 Like the people who work there really like it.
01:18:46,320 --> 01:18:47,920 It's a really great job.
01:18:47,920 --> 01:18:48,800 It's really complicated.
01:18:48,800 --> 01:18:50,960 Putting cars together is hard, right?
01:18:50,960 --> 01:18:53,480 And the car is moving and the parts are moving
01:18:53,480 --> 01:18:55,020 and sometimes the parts are damaged
01:18:55,020 --> 01:18:57,600 and you have to coordinate putting all the stuff together
01:18:57,600 --> 01:18:59,120 and people are good at it.
01:18:59,120 --> 01:19:00,400 They're good at it.
01:19:00,400 --> 01:19:01,800 And I remember one day I went to work
01:19:01,800 --> 01:19:04,000 and the line was shut down for some reason
01:19:04,000 --> 01:19:06,800 and some of the guys sitting around were really bummed
01:19:06,800 --> 01:19:09,280 because they had reorganized a bunch of stuff
01:19:09,280 --> 01:19:10,800 and they were gonna hit a new record
01:19:10,800 --> 01:19:12,800 for the number of cars built that day
01:19:12,800 --> 01:19:14,240 and they were all gung ho to do it.
01:19:14,240 --> 01:19:17,840 And these were big, tough buggers and, you know,
01:19:17,840 --> 01:19:20,240 but what they did was complicated and you couldn't do it.
01:19:20,240 --> 01:19:21,400 Yeah, and I mean.
01:19:21,400 --> 01:19:22,800 Well, after a while you could,
01:19:22,800 --> 01:19:24,240 but you'd have to work your way up
01:19:24,240 --> 01:19:27,280 because, you know, like putting the bright,
01:19:27,280 --> 01:19:31,000 what's called the brights, the trim on a car
01:19:31,000 --> 01:19:32,640 on a moving assembly line
01:19:32,640 --> 01:19:34,600 where it has to be attached 25 places
01:19:34,600 --> 01:19:38,080 in a minute and a half is unbelievably complicated
01:19:39,240 --> 01:19:42,520 and human beings can do it, it's really good.
01:19:42,520 --> 01:19:45,280 I think that's harder than driving a car, by the way.
01:19:45,280 --> 01:19:47,080 Putting together, working.
01:19:47,080 --> 01:19:48,600 Working on a factory.
01:19:48,600 --> 01:19:51,400 Two smart people can disagree.
01:19:51,400 --> 01:19:52,240 Yay.
01:19:52,240 --> 01:19:54,480 I think driving a car.
01:19:54,480 --> 01:19:56,160 We'll get you in the factory someday
01:19:56,160 --> 01:19:57,520 and then we'll see how you do.
01:19:57,520 --> 01:19:59,520 No, not for us humans driving a car is easy.
01:19:59,520 --> 01:20:03,080 I'm saying building a machine that drives a car
01:20:03,080 --> 01:20:04,320 is not easy.
01:20:04,320 --> 01:20:05,440 No, okay.
01:20:05,440 --> 01:20:07,440 Driving a car is easy for humans
01:20:07,440 --> 01:20:10,840 because we've been evolving for billions of years.
01:20:10,840 --> 01:20:11,680 To drive cars.
01:20:11,680 --> 01:20:13,320 Yeah, I noticed that.
01:20:13,320 --> 01:20:15,640 The pale of the cars are super cool.
01:20:16,640 --> 01:20:19,880 No, now you join the rest of the internet and mocking me.
01:20:19,880 --> 01:20:20,720 Okay.
01:20:20,720 --> 01:20:24,240 I wasn't mocking, I was just intrigued
01:20:24,240 --> 01:20:29,000 by your anthropology, I'll have to go dig into that.
01:20:29,000 --> 01:20:31,160 There's some inaccuracies there, yes.
01:20:31,160 --> 01:20:36,160 Okay, but in general, what have you learned
01:20:38,000 --> 01:20:43,000 in terms of thinking about passion, craftsmanship,
01:20:44,080 --> 01:20:49,080 tension, chaos, the whole mess of it?
01:20:49,080 --> 01:20:54,080 What have you learned, have taken away from your time
01:20:54,320 --> 01:20:57,080 working with Elon Musk, working at Tesla,
01:20:57,080 --> 01:21:02,080 which is known to be a place of chaos innovation,
01:21:02,680 --> 01:21:03,720 craftsmanship and all of those things?
01:21:03,720 --> 01:21:06,080 I really like the way you thought.
01:21:06,080 --> 01:21:07,760 You think you have an understanding
01:21:07,760 --> 01:21:10,080 about what first principles of something is
01:21:10,080 --> 01:21:11,720 and then you talk to Elon about it
01:21:11,720 --> 01:21:13,980 and you didn't scratch the surface.
01:21:15,560 --> 01:21:18,440 He has a deep belief that no matter what you do
01:21:18,440 --> 01:21:21,240 is a local maximum, right?
01:21:21,240 --> 01:21:24,320 And I had a friend, he invented a better electric motor
01:21:24,320 --> 01:21:27,000 and it was like a lot better than what we were using.
01:21:27,000 --> 01:21:28,160 And one day he came by, he said,
01:21:28,160 --> 01:21:30,120 you know, I'm a little disappointed
01:21:30,120 --> 01:21:31,960 because this is really great
01:21:31,960 --> 01:21:33,320 and you didn't seem that impressed.
01:21:33,320 --> 01:21:37,320 And I said, you know, when the super intelligent aliens come,
01:21:37,320 --> 01:21:38,960 are they gonna be looking for you?
01:21:38,960 --> 01:21:39,880 Like, where is he?
01:21:39,880 --> 01:21:41,960 The guy who built the motor.
01:21:41,960 --> 01:21:43,260 Yeah, probably not.
01:21:43,260 --> 01:21:48,260 But doing interesting work that's both innovative
01:21:49,460 --> 01:21:51,860 and let's say craftsman's work on the current thing
01:21:51,860 --> 01:21:54,260 is really satisfying and it's good.
01:21:54,260 --> 01:21:55,180 And that's cool.
01:21:55,180 --> 01:21:59,100 And then Elon was good at taking everything apart
01:21:59,100 --> 01:22:01,720 and like, what's the deep first principle?
01:22:01,720 --> 01:22:04,300 Oh, no, what's really, no, what's really new?
01:22:04,300 --> 01:22:08,140 You know, that ability to look at it
01:22:08,140 --> 01:22:13,140 without assumptions and how constraints is super wild.
01:22:14,300 --> 01:22:17,820 You know, he built rocket ship and electric car
01:22:17,820 --> 01:22:20,900 and everything and that's super fun.
01:22:20,900 --> 01:22:21,900 And he's into it too.
01:22:21,900 --> 01:22:26,180 Like when they first landed two SpaceX rockets at Tesla,
01:22:26,180 --> 01:22:28,040 we had a video projector in the big room
01:22:28,040 --> 01:22:29,860 and like 500 people came down
01:22:29,860 --> 01:22:31,340 and when they landed, everybody cheered
01:22:31,340 --> 01:22:32,700 and some people cried.
01:22:32,700 --> 01:22:33,780 It was so cool.
01:22:34,740 --> 01:22:36,300 All right, but how did you do that?
01:22:36,300 --> 01:22:39,440 Well, it was super hard.
01:22:40,580 --> 01:22:42,220 And then people say, well, it's chaotic.
01:22:42,220 --> 01:22:43,060 Really?
01:22:43,060 --> 01:22:44,620 To get out of all your assumptions,
01:22:44,620 --> 01:22:47,180 you think that's not gonna be unbelievably painful?
01:22:48,180 --> 01:22:50,100 And is Elon tough?
01:22:50,100 --> 01:22:51,420 Yeah, probably.
01:22:51,420 --> 01:22:53,300 Do people look back on it and say,
01:22:53,300 --> 01:22:57,540 boy, I'm really happy I had that experience
01:22:57,540 --> 01:23:01,740 to go take apart that many layers of assumptions?
01:23:02,900 --> 01:23:05,380 Sometimes super fun, sometimes painful.
01:23:05,380 --> 01:23:07,940 So it could be emotionally and intellectually painful.
01:23:07,940 --> 01:23:10,900 That whole process is just stripping away assumptions.
01:23:10,900 --> 01:23:13,380 Yeah, imagine 99% of your thought process
01:23:13,380 --> 01:23:15,420 is protecting your self-conception.
01:23:16,580 --> 01:23:18,700 And 98% of that's wrong.
01:23:20,140 --> 01:23:21,540 Now you got the math right.
01:23:22,620 --> 01:23:23,660 How do you think you're feeling
01:23:23,660 --> 01:23:26,820 when you get back into that one bit that's useful
01:23:26,820 --> 01:23:28,580 and now you're open and you have the ability
01:23:28,580 --> 01:23:30,680 to do something different?
01:23:32,700 --> 01:23:33,700 I don't know if I got the math right.
01:23:33,700 --> 01:23:37,460 It might be 99.9, but it ain't 50.
01:23:38,740 --> 01:23:42,420 Imagining it, the 50% is hard enough.
01:23:42,420 --> 01:23:43,260 Yeah.
01:23:44,220 --> 01:23:47,060 Now for a long time, I've suspected you could get better.
01:23:48,460 --> 01:23:50,740 Like you can think better, you can think more clearly,
01:23:50,740 --> 01:23:52,100 you can take things apart.
01:23:52,980 --> 01:23:55,400 And there's lots of examples of that.
01:23:55,400 --> 01:23:56,440 People who do that.
01:23:58,380 --> 01:23:59,220 So.
01:23:59,220 --> 01:24:01,020 And Elon is an example of that.
01:24:01,020 --> 01:24:02,180 Apparently. You are an example.
01:24:02,180 --> 01:24:05,560 I don't know if I am, I'm fun to talk to.
01:24:06,580 --> 01:24:07,420 Certainly.
01:24:07,420 --> 01:24:08,620 I've learned a lot of stuff.
01:24:08,620 --> 01:24:09,460 Right.
01:24:09,460 --> 01:24:10,500 Well, here's the other thing is like,
01:24:10,500 --> 01:24:13,900 I joke like I read books and people think,
01:24:13,900 --> 01:24:14,740 oh, you read books.
01:24:14,740 --> 01:24:19,740 Well, no, I've read a couple of books a week for 55 years.
01:24:19,860 --> 01:24:20,700 Wow.
01:24:20,700 --> 01:24:22,180 Well, maybe 50 because I didn't read,
01:24:22,180 --> 01:24:24,700 learned to read until I was age or something.
01:24:24,700 --> 01:24:28,540 And it turns out when people write books,
01:24:28,540 --> 01:24:31,280 they often take 20 years of their life
01:24:31,280 --> 01:24:33,340 where they passionately did something,
01:24:33,340 --> 01:24:36,100 reduce it to 200 pages.
01:24:36,100 --> 01:24:37,500 That's kind of fun.
01:24:37,500 --> 01:24:39,820 And then you go online and you can find out
01:24:39,820 --> 01:24:42,460 who wrote the best books and who like, you know,
01:24:42,460 --> 01:24:43,380 that's kind of wild.
01:24:43,380 --> 01:24:45,220 So there's this wild selection process
01:24:45,220 --> 01:24:46,060 and then you can read it.
01:24:46,060 --> 01:24:48,660 And for the most part, understand it.
01:24:49,900 --> 01:24:51,980 And then you can go apply it.
01:24:51,980 --> 01:24:53,020 Like I went to one company,
01:24:53,020 --> 01:24:55,120 I thought I haven't managed much before.
01:24:55,120 --> 01:24:57,300 So I read 20 management books
01:24:57,300 --> 01:24:58,740 and I started talking to them
01:24:58,740 --> 01:25:01,460 and basically compared to all the VPs running around,
01:25:01,460 --> 01:25:05,460 I'd read 19 more management books than anybody else.
01:25:05,460 --> 01:25:08,660 It wasn't even that hard.
01:25:08,660 --> 01:25:11,220 And half the stuff worked like first time.
01:25:11,220 --> 01:25:12,700 It wasn't even rocket science.
01:25:13,600 --> 01:25:17,020 But at the core of that is questioning the assumptions
01:25:17,020 --> 01:25:20,060 or sort of entering the thinking,
01:25:20,060 --> 01:25:21,820 first principles thinking,
01:25:21,820 --> 01:25:24,940 sort of looking at the reality of the situation
01:25:24,940 --> 01:25:28,260 and using that knowledge, applying that knowledge.
01:25:28,260 --> 01:25:31,420 So I would say my brain has this idea
01:25:31,420 --> 01:25:34,300 that you can question first assumptions.
01:25:35,280 --> 01:25:38,300 But I can go days at a time and forget that.
01:25:38,300 --> 01:25:41,500 And you have to kind of like circle back that observation.
01:25:42,540 --> 01:25:45,180 Because it is emotionally challenging.
01:25:45,180 --> 01:25:47,340 Well, it's hard to just keep it front and center
01:25:47,340 --> 01:25:50,420 because you operate on so many levels all the time
01:25:50,420 --> 01:25:53,500 and getting this done takes priority
01:25:53,500 --> 01:25:56,540 or being happy takes priority
01:25:56,540 --> 01:25:59,420 or screwing around takes priority.
01:25:59,420 --> 01:26:03,060 Like how you go through life is complicated.
01:26:03,060 --> 01:26:04,380 And then you remember, oh yeah,
01:26:04,380 --> 01:26:06,500 I could really think first principles.
01:26:06,500 --> 01:26:08,300 Oh shit, that's tiring.
01:26:09,620 --> 01:26:12,760 But you do for a while and that's kind of cool.
01:26:12,760 --> 01:26:16,200 So just as a last question in your sense
01:26:16,200 --> 01:26:19,500 from the big picture from the first principles,
01:26:19,500 --> 01:26:21,540 do you think, you kind of answered it already,
01:26:21,540 --> 01:26:24,340 but do you think autonomous driving
01:26:24,340 --> 01:26:28,740 is something we can solve on a timeline of years?
01:26:28,740 --> 01:26:33,740 So one, two, three, five, 10 years as opposed to a century.
01:26:33,900 --> 01:26:35,420 Yeah, definitely.
01:26:35,420 --> 01:26:37,460 Just to linger on it a little longer,
01:26:37,460 --> 01:26:40,140 where's the confidence coming from?
01:26:40,140 --> 01:26:42,660 Is it the fundamentals of the problem,
01:26:42,660 --> 01:26:46,420 the fundamentals of building the hardware and the software?
01:26:46,420 --> 01:26:51,420 As a computational problem, understanding ballistics roles,
01:26:51,420 --> 01:26:56,420 topography, it seems pretty solvable.
01:26:56,540 --> 01:26:59,740 I mean, and you can see this, like speech recognition
01:26:59,740 --> 01:27:01,720 for a long time, people are doing frequency
01:27:01,720 --> 01:27:04,400 and domain analysis and all kinds of stuff
01:27:04,400 --> 01:27:07,300 and that didn't work for at all, right?
01:27:07,300 --> 01:27:10,400 And then they did deep learning about it and it worked great.
01:27:11,380 --> 01:27:13,420 And it took multiple iterations
01:27:14,340 --> 01:27:18,180 and autonomous driving is way past
01:27:18,180 --> 01:27:19,860 the frequency analysis point.
01:27:19,860 --> 01:27:23,100 Use radar, don't run into things.
01:27:23,940 --> 01:27:25,500 And the data gathering is going up
01:27:25,500 --> 01:27:26,900 and the computation is going up
01:27:26,900 --> 01:27:28,660 and the algorithm understanding is going up
01:27:28,660 --> 01:27:30,060 and there's a whole bunch of problems
01:27:30,060 --> 01:27:32,020 getting solved like that.
01:27:32,020 --> 01:27:33,560 The data side is really powerful,
01:27:33,560 --> 01:27:35,820 but I disagree with both you and Elon.
01:27:35,820 --> 01:27:38,620 I'll tell Elon once again, as I did before,
01:27:38,620 --> 01:27:42,420 that when you add human beings into the picture,
01:27:43,420 --> 01:27:45,740 it's no longer a ballistics problem.
01:27:45,740 --> 01:27:47,540 It's something more complicated,
01:27:47,540 --> 01:27:50,420 but I could be very well proven wrong.
01:27:50,420 --> 01:27:53,100 Cars are highly damped in terms of rate of change.
01:27:53,960 --> 01:27:56,700 Like the steering system's really slow
01:27:56,700 --> 01:27:57,720 compared to a computer.
01:27:57,720 --> 01:28:01,080 The acceleration of the acceleration is really slow.
01:28:01,080 --> 01:28:04,220 Yeah, on a certain timescale, on a ballistics timescale,
01:28:04,220 --> 01:28:05,820 but human behavior, I don't know.
01:28:05,820 --> 01:28:08,080 I shouldn't say.
01:28:08,080 --> 01:28:09,860 Human beings are really slow too.
01:28:09,860 --> 01:28:14,020 Weirdly, we operate half a second behind reality.
01:28:14,020 --> 01:28:15,380 I don't know if he really understands that one either.
01:28:15,380 --> 01:28:16,500 It's pretty funny.
01:28:16,500 --> 01:28:18,220 Yeah, yeah.
01:28:20,460 --> 01:28:23,660 We very well could be surprised,
01:28:23,660 --> 01:28:25,220 and I think with the rate of improvement
01:28:25,220 --> 01:28:26,940 on all aspects on both the compute
01:28:26,940 --> 01:28:29,740 and the software and the hardware,
01:28:29,740 --> 01:28:32,620 there's gonna be pleasant surprises all over the place.
01:28:34,740 --> 01:28:36,780 Speaking of unpleasant surprises,
01:28:36,780 --> 01:28:39,580 many people have worries about a singularity
01:28:39,580 --> 01:28:41,720 in the development of AI.
01:28:41,720 --> 01:28:43,220 Forgive me for such questions.
01:28:43,220 --> 01:28:44,500 Yeah.
01:28:44,500 --> 01:28:46,860 When AI improves the exponential and reaches a point
01:28:46,860 --> 01:28:49,860 of superhuman level general intelligence,
01:28:51,340 --> 01:28:53,380 beyond the point, there's no looking back.
01:28:53,380 --> 01:28:56,160 Do you share this worry of existential threats
01:28:56,160 --> 01:28:57,420 from artificial intelligence,
01:28:57,420 --> 01:29:00,820 from computers becoming superhuman level intelligent?
01:29:01,980 --> 01:29:02,920 No, not really.
01:29:04,660 --> 01:29:07,580 We already have a very stratified society,
01:29:07,580 --> 01:29:09,420 and then if you look at the whole animal kingdom
01:29:09,420 --> 01:29:12,620 of capabilities and abilities and interests,
01:29:12,620 --> 01:29:15,340 and smart people have their niche,
01:29:15,340 --> 01:29:17,820 and normal people have their niche,
01:29:17,820 --> 01:29:19,700 and craftsmen have their niche,
01:29:19,700 --> 01:29:22,580 and animals have their niche.
01:29:22,580 --> 01:29:26,060 I suspect that the domains of interest
01:29:26,060 --> 01:29:29,500 for things that are astronomically different,
01:29:29,500 --> 01:29:32,340 like the whole something got 10 times smarter than us
01:29:32,340 --> 01:29:34,740 and wanted to track us all down because what?
01:29:34,740 --> 01:29:36,980 We like to have coffee at Starbucks?
01:29:36,980 --> 01:29:38,940 Like, it doesn't seem plausible.
01:29:38,940 --> 01:29:40,740 No, is there an existential problem
01:29:40,740 --> 01:29:42,580 that how do you live in a world
01:29:42,580 --> 01:29:44,140 where there's something way smarter than you,
01:29:44,140 --> 01:29:46,500 and you based your kind of self-esteem
01:29:46,500 --> 01:29:48,940 on being the smartest local person?
01:29:48,940 --> 01:29:52,600 Well, there's what, 0.1% of the population who thinks that?
01:29:52,600 --> 01:29:54,900 Because the rest of the population's been dealing with it
01:29:54,900 --> 01:29:56,780 since they were born.
01:29:56,780 --> 01:30:01,020 So the breadth of possible experience
01:30:01,020 --> 01:30:03,720 that can be interesting is really big.
01:30:03,720 --> 01:30:10,120 And, you know, superintelligence seems likely,
01:30:10,120 --> 01:30:13,280 although we still don't know if we're magical,
01:30:13,280 --> 01:30:15,440 but I suspect we're not.
01:30:15,440 --> 01:30:18,040 And it seems likely that it'll create possibilities
01:30:18,040 --> 01:30:20,040 that are interesting for us,
01:30:20,040 --> 01:30:23,720 and its interests will be interesting for that,
01:30:23,720 --> 01:30:26,000 for whatever it is.
01:30:26,000 --> 01:30:28,120 It's not obvious why its interests
01:30:28,120 --> 01:30:31,600 would somehow want to fight over some square foot of dirt
01:30:31,600 --> 01:30:36,600 or whatever the usual fears are about.
01:30:37,720 --> 01:30:39,040 So you don't think it'll inherit
01:30:39,040 --> 01:30:41,340 some of the darker aspects of human nature?
01:30:42,200 --> 01:30:45,280 Depends on how you think reality's constructed.
01:30:45,280 --> 01:30:49,760 So for whatever reason, human beings are in,
01:30:49,760 --> 01:30:52,360 let's say, creative tension and opposition
01:30:52,360 --> 01:30:55,440 with both our good and bad forces.
01:30:55,440 --> 01:30:58,240 Like, there's lots of philosophical understanding to that.
01:30:58,240 --> 01:31:00,400 Right?
01:31:00,400 --> 01:31:03,200 I don't know why that would be different.
01:31:03,200 --> 01:31:06,720 So you think the evil is necessary for the good?
01:31:06,720 --> 01:31:08,220 I mean, the tension.
01:31:08,220 --> 01:31:09,120 I don't know about evil,
01:31:09,120 --> 01:31:11,640 but like we live in a competitive world
01:31:11,640 --> 01:31:16,640 where your good is somebody else's evil.
01:31:16,680 --> 01:31:19,320 You know, there's the malignant part of it,
01:31:19,320 --> 01:31:22,760 but that seems to be self-limiting,
01:31:22,760 --> 01:31:26,320 although occasionally it's super horrible.
01:31:26,320 --> 01:31:30,000 But yes, there's a debate over ideas
01:31:30,000 --> 01:31:32,360 and some people have different beliefs
01:31:32,360 --> 01:31:34,600 and that debate itself is a process.
01:31:34,600 --> 01:31:37,560 So the arriving at something.
01:31:37,560 --> 01:31:39,360 Yeah, and why wouldn't that continue?
01:31:39,360 --> 01:31:40,200 Yeah.
01:31:41,600 --> 01:31:43,160 But you don't think that whole process
01:31:43,160 --> 01:31:46,140 will leave humans behind in a way that's painful?
01:31:47,440 --> 01:31:48,680 Emotionally painful, yes.
01:31:48,680 --> 01:31:51,040 For the 0.1%, they'll be.
01:31:51,040 --> 01:31:52,360 Why isn't it already painful
01:31:52,360 --> 01:31:54,080 for a large percentage of the population?
01:31:54,080 --> 01:31:54,920 And it is.
01:31:54,920 --> 01:31:57,880 I mean, society does have a lot of stress in it,
01:31:57,880 --> 01:32:00,680 about the 1% and about the this and about the that,
01:32:00,680 --> 01:32:03,760 but you know, everybody has a lot of stress in their life
01:32:03,760 --> 01:32:05,240 about what they find satisfying
01:32:05,240 --> 01:32:09,760 and you know, know yourself seems to be the proper dictum
01:32:10,800 --> 01:32:14,240 and pursue something that makes your life meaningful
01:32:14,240 --> 01:32:15,200 seems proper.
01:32:16,280 --> 01:32:18,720 And there's so many avenues on that.
01:32:18,720 --> 01:32:21,120 Like, there's so much unexplored space
01:32:21,120 --> 01:32:22,580 at every single level.
01:32:22,580 --> 01:32:27,340 So, you know, I'm somewhat of,
01:32:27,340 --> 01:32:29,700 my nephew called me a jaded optimist.
01:32:29,700 --> 01:32:33,900 And you know, so it's.
01:32:33,900 --> 01:32:37,220 There's a beautiful tension in that label.
01:32:37,220 --> 01:32:41,020 But if you were to look back at your life
01:32:41,020 --> 01:32:45,860 and could relive a moment, a set of moments,
01:32:45,860 --> 01:32:49,300 because there were the happiest times of your life
01:32:49,300 --> 01:32:52,660 outside of family, what would that be?
01:32:54,740 --> 01:32:56,740 I don't want to relive any moments.
01:32:56,740 --> 01:32:58,100 I like that.
01:32:58,100 --> 01:33:01,420 I like that situation where you have some amount of optimism
01:33:01,420 --> 01:33:04,920 and then the anxiety of the unknown.
01:33:06,140 --> 01:33:10,180 So you love the unknown, the mystery of it.
01:33:10,180 --> 01:33:11,300 I don't know about the mystery.
01:33:11,300 --> 01:33:13,000 It sure gets your blood pumping.
01:33:14,120 --> 01:33:17,180 What do you think is the meaning of this whole thing?
01:33:17,180 --> 01:33:20,700 Of life on this pale blue dot?
01:33:21,820 --> 01:33:23,940 It seems to be what it does.
01:33:25,340 --> 01:33:29,340 Like the universe, for whatever reason,
01:33:29,340 --> 01:33:32,860 makes atoms, which makes us, which we do stuff.
01:33:34,420 --> 01:33:38,100 And we figure out things and we explore things and.
01:33:38,100 --> 01:33:39,900 That's just what it is.
01:33:39,900 --> 01:33:41,660 It's not just.
01:33:41,660 --> 01:33:43,620 Yeah, it is.
01:33:44,620 --> 01:33:46,940 Jim, I don't think there's a better place to end it.
01:33:46,940 --> 01:33:50,180 It's a huge honor and.
01:33:50,180 --> 01:33:51,260 Well, that was super fun.
01:33:51,260 --> 01:33:52,580 Thank you so much for talking today.
01:33:52,580 --> 01:33:54,140 All right, great.
01:33:54,140 --> 01:33:56,260 Thanks for listening to this conversation
01:33:56,260 --> 01:33:59,420 and thank you to our presenting sponsor, Cash App.
01:33:59,420 --> 01:34:02,100 Download it, use code LexPodcast.
01:34:02,100 --> 01:34:04,900 You'll get $10 and $10 will go to FIRST,
01:34:04,900 --> 01:34:07,700 a STEM education nonprofit that inspires hundreds
01:34:07,700 --> 01:34:10,820 of thousands of young minds to become future leaders
01:34:10,820 --> 01:34:12,260 and innovators.
01:34:12,260 --> 01:34:15,100 If you enjoy this podcast, subscribe on YouTube,
01:34:15,100 --> 01:34:18,340 get five stars on Apple Podcast, follow on Spotify,
01:34:18,340 --> 01:34:22,380 support on Patreon, or simply connect with me on Twitter.
01:34:22,380 --> 01:34:24,860 And now let me leave you with some words of wisdom
01:34:24,860 --> 01:34:26,940 from Gordon Moore.
01:34:26,940 --> 01:34:30,980 If everything you try works, you aren't trying hard enough.
01:34:30,980 --> 01:34:43,980 Thank you for listening and hope to see you next time.