Intro topic: Grills



News/Links:



 * You can’t call yourself a senior until you’ve worked on a legacy project
   * https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project
 * Recraft might be the most powerful AI image platform I’ve ever used — here’s why
   * https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why
 * NASA has a list of 10 rules for software development
   * https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm
 * AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
   * https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre 

Book of the Show

 * Patrick: 
   * The Player of Games (Ian M Banks)
     * https://a.co/d/1ZpUhGl (non-affiliate)
 * Jason: 
   * Basic Roleplaying Universal Game Engine
     * https://amzn.to/3ES4p5i
       
       



Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h



Tool of the Show

 * Patrick: 
   * Pokemon Sword and Shield
 * Jason: 
   * Features and Labels ( https://fal.ai )

Topic: Reinforcement Learning



 * Three types of AI
   * Supervised Learning
   * Unsupervised Learning
   * Reinforcement Learning
 * Online vs Offline RL
 * Optimization algorithms
   * Value optimization
     * SARSA
     * Q-Learning
   * Policy optimization
     * Policy Gradients
     * Actor-Critic
     * Proximal Policy Optimization
 * Value vs Policy Optimization
   * Value optimization is more intuitive (Value loss)
   * Policy optimization is less intuitive at first (policy gradients)
   * Converting values to policies in deep learning is difficult
 * Imitation Learning
   * Supervised policy learning
   * Often used to bootstrap reinforcement learning
 * Policy Evaluation
   * Propensity scoring versus model-based
 * Challenges to training RL model
   * Two optimization loops
     * Collecting feedback vs updating the model
   * Difficult optimization target
     * Policy evaluation
 * RLHF &  GRPO

★ Support this podcast on Patreon ★ [https://www.patreon.com/programmingthrowdown]

180: Reinforcement Learning

Programming Throwdown educates Computer Scientists and Software Engineers on a
cavalcade of programming and tech topics. Every show will cover a new
programming language, so listeners will be able to speak intelligently about any
programming language.