Training Data

Sequoia Capital

Join us as we train our neural nets on the theme of the century: AI. Sonya Huang, Pat Grady and more Sequoia Capital partners host conversations with leading AI builders and researchers to ask critical questions and develop a deeper understanding of the evolving technologies—and their implications for technology, business and society. The content of this podcast does not constitute investment advice, an offer to provide investment advisory services, or an offer to sell or solicitation of an offer to buy an interest in any investment fund. read less
TechnologyTechnology
BusinessBusiness

Episodes

OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better
Oct 2 2024
OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better
Combining LLMs with AlphaGo-style deep reinforcement learning has been a holy grail for many leading AI labs, and with o1 (aka Strawberry) we are seeing the most general merging of the two modes to date. o1 is admittedly better at math than essay writing, but it has already achieved SOTA on a number of math, coding and reasoning benchmarks. Deep RL legend and now OpenAI researcher Noam Brown and teammates Ilge Akkaya and Hunter Lightman discuss the ah-ha moments on the way to the release of o1, how it uses chains of thought and backtracking to think through problems, the discovery of strong test-time compute scaling laws and what to expect as the model gets better.  Hosted by: Sonya Huang and Pat Grady, Sequoia Capital  Mentioned in this episode: Learning to Reason with LLMs: Technical report accompanying the launch of OpenAI o1. Generator verifier gap: Concept Noam explains in terms of what kinds of problems benefit from more inference-time compute. Agent57: Outperforming the human Atari benchmark, 2020 paper where DeepMind demonstrated “the first deep reinforcement learning agent to obtain a score that is above the human baseline on all 57 Atari 2600 games.” Move 37: Pivotal move in AlphaGo’s second game against Lee Sedol where it made a move so surprising that Sedol thought it must be a mistake, and only later discovered he had lost the game to a superhuman move. IOI competition: OpenAI entered o1 into the International Olympiad in Informatics and received a Silver Medal. System 1, System 2: The thesis if Danial Khaneman’s pivotal book of behavioral economics, Thinking, Fast and Slow, that positied two distinct modes of thought, with System 1 being fast and instinctive and System 2 being slow and rational. AlphaZero: The predecessor to AlphaGo which learned a variety of games completely from scratch through self-play. Interestingly, self-play doesn’t seem to have a role in o1. Solving Rubik’s Cube with a robot hand: Early OpenAI robotics paper that Ilge Akkaya worked on. The Last Question: Science fiction story by Isaac Asimov with interesting parallels to scaling inference-time compute. Strawberry: Why? O1-mini: A smaller, more efficient version of 1 for applications that require reasoning without broad world knowledge. 00:00 - Introduction 01:33 - Conviction in o1 04:24 - How o1 works 05:04 - What is reasoning? 07:02 - Lessons from gameplay 09:14 - Generation vs verification 10:31 - What is surprising about o1 so far 11:37 - The trough of disillusionment 14:03 - Applying deep RL 14:45 - o1’s AlphaGo moment? 17:38 - A-ha moments 21:10 - Why is o1 good at STEM? 24:10 - Capabilities vs usefulness 25:29 - Defining AGI 26:13 - The importance of reasoning 28:39 - Chain of thought 30:41 - Implication of inference-time scaling laws 35:10 - Bottlenecks to scaling test-time compute 38:46 - Biggest misunderstanding about o1? 41:13 - o1-mini 42:15 - How should founders think about o1?
Why Vlad Tenev and Tudor Achim of Harmonic Think AI Is About to Change Math—and Why It Matters
Sep 24 2024
Why Vlad Tenev and Tudor Achim of Harmonic Think AI Is About to Change Math—and Why It Matters
Adding code to LLM training data is a known method of improving a model’s reasoning skills. But wouldn’t math, the basis of all reasoning, be even better? Up until recently, there just wasn’t enough usable data that describes mathematics to make this feasible. A few years ago, Vlad Tenev (also founder of Robinhood) and Tudor Achim noticed the rise of the community around an esoteric programming language called Lean that was gaining traction among mathematicians. The combination of that and the past decade’s rise of autoregressive models capable of fast, flexible learning made them think the time was now and they founded Harmonic. Their mission is both lofty—mathematical superintelligence—and imminently practical, verifying all safety-critical software. Hosted by: Sonya Huang and Pat Grady, Sequoia Capital  Mentioned in this episode: IMO and the Millennium Prize: Two significant global competitions Harmonic hopes to win (soon) Riemann hypothesis: One of the most difficult unsolved math conjectures (and a Millenium Prize problem) most recently in the sights of MIT mathematician Larry Guth Terry Tao: perhaps the greatest living mathematician and Vlad’s professor at UCLA Lean: an open source functional language for code verification launched by Leonardo de Moura when at Microsoft Research in 2013 that powers the Lean Theorem Prover mathlib: the largest math textbook in the world, all written in Lean Metaculus: online prediction platform that tracks and scores thousands of forecasters Minecraft Beaten in 20 Seconds: The video Vlad references as an analogy to AI math Navier-Stokes equations: another important Millenium Prize math problem. Vlad considers this more tractable that Riemann John von Neumann: Hungarian mathematician and polymath that made foundational contributions to computing, the Manhattan Project and game theory Gottfried Wilhelm Leibniz: co-inventor of calculus and (remarkably) creator of the “universal characteristic,” a system for reasoning through a language of symbols and calculations—anticipating Lean and Harmonic by 350 years! 00:00 - Introduction 01:42 - Math is reasoning 06:16 - Studying with the world's greatest living mathematician 10:18 - What does the math community think of AI math? 15:11 - Recursive self-improvement 18:31 - What is Lean? 21:05 - Why now? 22:46 - Synthetic data is the fuel for the model 27:29 - How fast will your model get better? 29:45 - Exploring the frontiers of human knowledge 34:11 - Lightning round
Jim Fan on Nvidia’s Embodied AI Lab and Jensen Huang’s Prediction that All Robots will be Autonomous
Sep 17 2024
Jim Fan on Nvidia’s Embodied AI Lab and Jensen Huang’s Prediction that All Robots will be Autonomous
AI researcher Jim Fan has had a charmed career. He was OpenAI’s first intern before he did his PhD at Stanford with “godmother of AI,” Fei-Fei Li. He graduated into a research scientist position at Nvidia and now leads its Embodied AI “GEAR” group. The lab’s current work spans foundation models for humanoid robots to agents for virtual worlds. Jim describes a three-pronged data strategy for robotics, combining internet-scale data, simulation data and real world robot data. He believes that in the next few years it will be possible to create a “foundation agent” that can generalize across skills, embodiments and realities—both physical and virtual. He also supports Jensen Huang’s idea that “Everything that moves will eventually be autonomous.” Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital Mentioned in this episode: World of Bits: Early OpenAI project Jim worked on as an intern with Andrej Karpathy. Part of a bigger initiative called Universe Fei-Fei Li: Jim’s PhD advisor at Stanford who founded the ImageNet project in 2010 that revolutionized the field of visual recognition, led the Stanford Vision Lab and just launched her own AI startup, World Labs Project GR00T: Nvidia’s “moonshot effort” at a robotic foundation model, premiered at this year’s GTC Thinking Fast and Slow: Influential book by Daniel Kahneman that popularized some of his teaching from behavioral economics Jetson Orin chip: The dedicated series of edge computing chips Nvidia is developing to power Project GR00T Eureka: Project by Jim’s team that trained a five finger robot hand to do pen spinning MineDojo: A project Jim did when he first got to Nvidia that developed a platform for general purpose agents in the game of Minecraft. Won NeurIPS 2022 Outstanding Paper Award ADI: artificial dog intelligence Mamba: Selective State Space Models, an alternative architecture to Transformers that Jim is interested in (original paper here) 00:00 Introduction 01:35 Jim’s journey to embodied intelligence 04:53 The GEAR Group 07:32 Three kinds of data for robotics 10:32 A GPT-3 moment for robotics 16:05 Choosing the humanoid robot form factor 19:37 Specialized generalists 21:59 GR00T gets its own chip 23:35 Eureka and Issac Sim 25:23 Why now for robotics? 28:53 Exploring virtual worlds 36:28 Implications for games 39:13 Is the virtual world in service of the physical world? 42:10 Alternative architectures to Transformers 44:15 Lightning round
Founder Eric Steinberger on Magic’s Counterintuitive Approach to Pursuing AGI
Sep 10 2024
Founder Eric Steinberger on Magic’s Counterintuitive Approach to Pursuing AGI
There’s a new archetype in Silicon Valley, the AI researcher turned founder. Instead of tinkering in a garage they write papers that earn them the right to collaborate with cutting-edge labs until they break out and start their own. This is the story of wunderkind Eric Steinberger, the founder and CEO of Magic.dev. Eric came to programming through his obsession with AI and caught the attention of DeepMind researchers as a high school student. In 2022 he realized that AGI was closer than he had previously thought and started Magic to automate the software engineering necessary to get there. Among his counterintuitive ideas are the need to train proprietary large models, that value will not accrue in the application layer and that the best agents will manage themselves. Eric also talks about Magic’s recent 100M token context window model and the HashHop eval they’re open sourcing. Hosted by: Sonya Huang, Sequoia Capital Mentioned in this episode: David Silver: DeepMind researcher that led the AlphaGo team Johannes Heinrich: a PhD student of Silver’s and DeepMind researcher who mentored Eric as a highschooler Reinforcement Learning from Self-Play in Imperfect-Information Games: Johannes’s dissertation that inspired Eric  Noam Brown: DeepMind, Meta and now OpenAI reinforcement learning researcher who eventually collaborated with Eric and brought him to FAIR ClimateScience: NGO that Eric co-founded in 2019 while a university student  Noam Shazeer: One of the original Transformers researchers at Google and founder of Charater.ai  DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker: the first AI paper Eric ever tried to deeply understand LTM-2-mini: Magic’s first 100M token context model, build using the HashHop eval (now available open source) 00:00 - Introduction 01:39 - Vienna-born wunderkind 04:56 - Working with Noam Brown 8:00 - “I can do two things. I cannot do three.” 10:37 - AGI to-do list 13:27 - Advice for young researchers 20:35 - Reading every paper voraciously 23:06 - The army of Noams 26:46 - The leaps still needed in research 29:59 - What is Magic? 36:12 - Competing against the 800-pound gorillas 38:21 - Ideal team size for researchers 40:10 - AI that feels like a colleague 44:30 - Lightning round 47:50 - Bonus round: 200M token context announcement
Sierra Co-Founder Clay Bavor on Making Customer-Facing AI Agents Delightful
Aug 27 2024
Sierra Co-Founder Clay Bavor on Making Customer-Facing AI Agents Delightful
Customer service is hands down the first killer app of generative AI for businesses. The reasons are simple: the costs of existing solutions are so high, the satisfaction so low and the margin for ROI so wide. But trusting your interactions with customers to hallucination-prone LLMs can be daunting. Enter Sierra. Co-founder Clay Bavor walks us through the sophisticated engineering challenges his team solved along the way to delivering AI agents for all aspects of the customer experience that are delightful, safe and reliable—and being deployed widely by Sierra’s customers. The Company’s AgentOS enables businesses to create branded AI agents to interact with customers, follow nuanced policies and even handle customer retention and upsell. Clay describes how companies can capture their brand voice, values and internal processes to create AI agents that truly represent the business. Hosted by: Ravi Gupta and Pat Grady, Sequoia Capital Mentioned in this episode: Bret Taylor: co-founder of Sierra Towards a Human-like Open-Domain Chatbot: 2020 Google paper that introduced Meena, a predecessor of ChatGPT (followed by LaMDA in 2021) PaLM: Scaling Language Modeling with Pathways: 2022 Google paper about their unreleased 540B parameter transformer model (GPT-3, at the time, had 175B)  Avocado chair: Images generated by OpenAI’s DALL·E model in 2022 Large Language Models Understand and Can be Enhanced by Emotional Stimuli: 2023 Microsoft paper on how models like GPT-4 can be manipulated into providing better results 𝛕-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains: 2024 paper authored by Sierra research team, led by Karthik Narasimhan (co-author of the 2022 ReACT paper and the 2023 Reflexion paper) 00:00:00 Introduction 00:01:21 Clay’s background 00:03:20 Google before the ChatGPT moment 00:07:31 What is Sierra? 00:12:03 What’s possible now that wasn’t possible 18 months ago? 00:17:11 AgentOS 00:23:45 The solution to many problems with AI is more AI 00:28:37 𝛕-bench 00:33:19 Engineering task vs research task 00:37:27 What tasks can you trust an agent with now? 00:43:21 What metrics will move? 00:46:22 The reality of deploying AI to customers today 00:53:33 The experience manager 01:03:54 Outcome-based pricing 01:05:55 Lightning Round
Phaidra’s Jim Gao on Building the Fourth Industrial Revolution with Reinforcement Learning
Aug 20 2024
Phaidra’s Jim Gao on Building the Fourth Industrial Revolution with Reinforcement Learning
After AlphaGo beat Lee Sedol, a young mechanical engineer at Google thought of another game reinforcement learning could win: energy optimization at data centers. Jim Gao convinced his bosses at the Google data center team to let him work with the DeepMind team to try. The initial pilot resulted in a 40% energy savings and led he and his co-founders to start Phaidra to turn this technology into a product. Jim discusses the challenges of AI readiness in industrial settings and how we have to build on top of the control systems of the 70s and 80s to achieve the promise of the Fourth Industrial Revolution. He believes this new world of self-learning systems and self-improving infrastructure is a key factor in addressing global climate change. Hosted by: Sonya Huang and Pat Grady, Sequoia Capital  Mentioned in this episode: Mustafa Suleyman: Co-founder of DeepMind and Inflection AI and currently CEO of Microsoft AI, known to his friends as “Moose” Joe Kava: Google VP of data centers who Jim sent his initial email to pitching the idea that would eventually become Phaidra Constrained optimization: the class of problem that reinforcement learning can be applied to in real world systems  Vedavyas Panneershelvam: co-founder and CTO of Phaidra; one of the original engineers on the AlphaGo project Katie Hoffman: co-founder, President and COO of Phaidra  Demis Hassabis: CEO of DeepMind
Fireworks Founder Lin Qiao on How Fast Inference and Small Models Will Benefit Businesses
Aug 13 2024
Fireworks Founder Lin Qiao on How Fast Inference and Small Models Will Benefit Businesses
In the first wave of the generative AI revolution, startups and enterprises built on top of the best closed-source models available, mostly from OpenAI. The AI customer journey moves from training to inference, and as these first products find PMF, many are hitting a wall on latency and cost. Fireworks Founder and CEO Lin Qiao led the PyTorch team at Meta that rebuilt the whole stack to meet the complex needs of the world’s largest B2C company. Meta moved PyTorch to its own non-profit foundation in 2022 and Lin started Fireworks with the mission to compress the timeframe of training and inference and democratize access to GenAI beyond the hyperscalers to let a diversity of AI applications thrive. Lin predicts when open and closed source models will converge and reveals her goal to build simple API access to the totality of knowledge. Hosted by: Sonya Huang and Pat Grady, Sequoia Capital  Mentioned in this episode: Pytorch: the leading framework for building deep learning models, originated at Meta and now part of the Linux Foundation umbrella Caffe2 and ONNX: ML frameworks Meta used that PyTorch eventually replaced Conservation of complexity: the idea that that every computer application has inherent complexity that cannot be reduced but merely moved between the backend and frontend, originated by Xerox PARC researcher Larry Tesler  Mixture of Experts: a class of transformer models that route requests between different subsets of a model based on use case Fathom: a product the Fireworks team uses for video conference summarization  LMSYS Chatbot Arena: crowdsourced open platform for LLM evals hosted on Hugging Face  00:00 - Introduction 02:01 - What is Fireworks? 02:48 - Leading Pytorch 05:01 - What do researchers like about PyTorch? 07:50 - How Fireworks compares to open source 10:38 - Simplicity scales 12:51 - From training to inference 17:46 - Will open and closed source converge? 22:18 - Can you match OpenAI on the Fireworks stack? 26:53 - What is your vision for the Fireworks platform? 31:17 - Competition for Nvidia? 32:47 - Are returns to scale starting to slow down? 34:28 - Competition 36:32 - Lightning round
GitHub CEO Thomas Dohmke on Building Copilot, and the the Future of Software Development
Aug 6 2024
GitHub CEO Thomas Dohmke on Building Copilot, and the the Future of Software Development
GithHub invented collaborative coding and in the process changed how open source projects, startups and eventually enterprises write code. GitHub Copilot is the first blockbuster product built on top of OpenAI’s GPT models. It now accounts for more than 40 percent of GitHub revenue growth for an annual revenue run rate of $2 billion. Copilot itself is already a larger business than all of GitHub was when Microsoft acquired it in 2018. We talk to CEO Thomas Dohmke about how a small team at GitHub built on top of GPT-3 and quickly created a product that developers love—and can’t live without. Thomas describes how the product has grown from simple autocomplete to a fully featured workspace for enterprise teams. He also believes that tools like Copilot will bring the power of coding to a billion developers by 2030. Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital  Mentioned in this episode: Nat Friedman: Former Microsoft VP (and now investor) who came up with the idea that Microsoft should buy GitHub Oege de Moor: Github developer (and now founder of XBOW) who came up with the idea of using GPT-3 for code and went on to create Copilot Alex Graveley: principal engineer and Chief Architect for Copilot (now CEO of Minion.ai) who came up with the name Copilot (because his boss, Nat Firedman, is an amateur pilot) Productivity Assessment of Neural Code Completion: Original GitHub research paper on the impact of Copilot on Developer productivity Escaping a room in Minecraft with an AI-powered NPC: Recent Minecraft AI assistant demo from Microsoft With AI, anyone can be a coder now: TED2024 talk by Thomas Dohmke JFrog: The software supply chain platform that GitHub just partnered with 00:00:00 - Introduction 00:01:18 - Getting started with code 00:03:43 - Microsoft’s acquisition of GitHub 00:11:40 - Evolving Copilot beyond autocomplete 00:14:18 - In hindsight, you can always move faster 00:15:56 - Building on top of OpenAI 00:20:21 - The latest metrics 00:22:11 - The surprise of Copilot’s impact 00:25:11 - Teaching kids to code in the age of Copilot 00:26:38 - The momentum mindset 00:29:46 - Agents vs Copilots 00:32:06 - The Roadmap 00:37:31 - Making maintaining software easier 00:38:48 - The creative new world 00:42:38 - The AI 10x software engineer 00:45:12 - Creativity and systems engineering in AI 00:48:55 - What about COBOL? 00:50:23 - Will GitHub build its own models? 00:57:19 - Rapid incubation at GitHub Next 00:59:21 - The future of AI? 01:03:18 - Advice for founders 01:05:08 - Lightning round
Meta’s Joe Spisak on Llama 3.1 405B and the Democratization of Frontier Models
Jul 30 2024
Meta’s Joe Spisak on Llama 3.1 405B and the Democratization of Frontier Models
As head of Product Management for Generative AI at Meta, Joe Spisak leads the team behind Llama, which just released the new 3.1 405B model. We spoke with Joe just two days after the model’s release to ask what’s new, what it enables, and how Meta sees the role of open source in the AI ecosystem. Joe shares that where Llama 3.1 405B really focused is on pushing scale (it was trained on 15 trillion tokens using 16,000 GPUs) and he’s excited about the zero-shot tool use it will enable, as well as its role in distillation and generating synthetic data to teach smaller models. He tells us why he thinks even frontier models will ultimately commoditize—and why that’s a good thing for the startup ecosystem. Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital  Mentioned in this episode:  Llama 3.1 405B paper Open Source AI Is the Way Forward: Mark Zuckerberg essay released with Llama 3.1. Mistral Large 2 The Bitter Lesson by Rich Sutton 00:00 Introduction 01:28 The Llama 3.1 405B launch 05:02 The open source license 07:01 What's in it for Meta? 10:19 Why not open source? 11:16 Will frontier models commoditize? 12:41 What about startups? 16:29 The Mistral team 19:36 Are all frontier strategies comparable? 22:38 Is model development becoming more like software development? 26:34 Agentic reasoning 29:09 What future levers will unlock reasoning? 31:20 Will coding and math lead to unlocks? 33:09 Small models 34:08 7X more data 37:36 Are we going to hit a wall? 39:49 Lightning round
Klarna CEO Sebastian Siemiatkowski on Getting AI to Do the Work of 700 Customer Service Reps
Jul 23 2024
Klarna CEO Sebastian Siemiatkowski on Getting AI to Do the Work of 700 Customer Service Reps
In February, Sebastian Siemiatkowski boldly announced that Klarna’s new OpenAI-powered assistant handled two thirds of the Swedish fintech’s customer service chats in its first month. Not only were customer satisfaction metrics better, but by replacing 700 full-time contractors the bottom line impact is projected to be $40M. Since then, every company we talk to wants to know, “How do we get the Klarna customer support thing?” Co-founder and CEO Sebastian Siemiatkowski tells us how the Klarna team shipped this new product in record time—and how embracing AI internally with an experimental mindset is transforming the company. He discusses how AI development is proliferating inside the company, from customer support to marketing to internal knowledge to customer-facing experiences.  Sebastian also reflects on the impacts of AI on employment, society, and the arts while encouraging lawmakers to be open minded about the benefits. Hosted by: Sonya Huang and Pat Grady, Sequoia Capital  Mentioned in this episode: DeepL: Language translation app that Sebastian says makes 10,000 translators in Brussels redundant The Klarna brand: The offbeat optimism that the company is now augmenting with AI Neo4j: The graph database management system that Klarna is using to build Kiki, their internal knowledge base 00:00 Introduction 01:57 Klarna’s business 03:00 Pitching OpenAI 08:51 How we built this 10:46 Will Klara ever completely replace its CS team with AI? 14:22 The benefits 17:25 If you had a policy magic wand… 21:12 What jobs will be most affected by AI? 23:58 How about marketing? 27:55 How creative are LLMs? 30:11 Klarna’s knowledge graph, Kiki 33:10 Reducing the number of enterprise systems 35:24 Build vs buy? 39:59 What’s next for Klarna with AI? 48:48 Lightning round
Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs
Jul 16 2024
Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs
LLMs are democratizing digital intelligence, but we’re all waiting for AI agents to take this to the next level by planning tasks and executing actions to actually transform the way we work and live our lives.  Yet despite incredible hype around AI agents, we’re still far from that “tipping point” with best in class models today. As one measure: coding agents are now scoring in the high-teens % on the SWE-bench benchmark for resolving GitHub issues, which far exceeds the previous unassisted baseline of 2% and the assisted baseline of 5%, but we’ve still got a long way to go. Why is that? What do we need to truly unlock agentic capability for LLMs? What can we learn from researchers who have built both the most powerful agents in the world, like AlphaGo, and the most powerful LLMs in the world?  To find out, we’re talking to Misha Laskin, former research scientist at DeepMind. Misha is embarking on his vision to build the best agent models by bringing the search capabilities of RL together with LLMs at his new company, Reflection AI. He and his cofounder Ioannis Antonoglou, co-creator of AlphaGo and AlphaZero and RLHF lead for Gemini, are leveraging their unique insights to train the most reliable models for developers building agentic workflows. Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital  00:00 Introduction 01:11 Leaving Russia, discovering science 10:01 Getting into AI with Ioannis Antonoglou 15:54 Reflection AI and agents 25:41 The current state of Ai agents 29:17 AlphaGo, AlphaZero and Gemini 32:58 LLMs don’t have a ground truth reward 37:53 The importance of post-training 44:12 Task categories for agents 45:54 Attracting talent 50:52 How far away are capable agents? 56:01 Lightning round Mentioned:  The Feynman Lectures on Physics: The classic text that got Misha interested in science. Mastering the game of Go with deep neural networks and tree search: The original 2016 AlphaGo paper. Mastering the game of Go without human knowledge: 2017 AlphaGo Zero paper Scaling Laws for Reward Model Overoptimization: OpenAI paper on how reward models can be gamed at all scales for all algorithms. Mapping the Mind of a Large Language Model: Article about Anthropic mechanistic interpretability paper that identifies how millions of concepts are represented inside Claude Sonnet Pieter Abeel: Berkeley professor and founder of Covariant who Misha studied with A2C and A3C: Advantage Actor Critic and Asynchronous Advantage Actor Critic, the two algorithms developed by Misha’s manager at DeepMind, Volodymyr Mnih, that defined reinforcement learning and deep reinforcement learning