Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
How Lisp is designing Nanotechnology (with Prof. Christian Schafmeister) image

How Lisp is designing Nanotechnology (with Prof. Christian Schafmeister)

Developer Voices
Avatar
1.4k Plays1 year ago

One of our oldest languages meets one of our newest sciences in this episode, as we talk with Professor Christian Schafmeister, an award-winning nanotech researcher who's been developing a language and a design suite to help research the future molecular machines.

In this episode Christian gives us a quick chemistry lesson to explain what his research is trying to achieve, then we get into the software that's doing it: A new flavour of Common Lisp. But why Lisp? What advantages does a 60 year old language design offer? How does he strike a balance between high-level language features and the need for exceptional performance and parallelism?  And what tricks does his development environment have that modern IDEs could still learn a thing or two from?

--

Clasp (the Lisp): https://github.com/clasp-developers/clasp

Cando (the design language): https://github.com/cando-developers/cando

The Feynman Prize: https://en.wikipedia.org/wiki/Feynman_Prize_in_Nanotechnology

Alphafold: https://alphafold.ebi.ac.uk/

More on LEaP: https://ambermd.org/tutorials/pengfei/

Interactive Development of Crash Bandicoot: https://all-things-andy-gavin.com/2011/03/12/making-crash-bandicoot-gool-part-9/ 

Christian's Research Group: https://www.schafmeistergroup.com/

Kris on Twitter: https://twitter.com/krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

--

#programming #software #lisplang #commonlisp #nanotech

Recommended
Transcript

Introduction to Common Lisp in Nanotech

00:00:00
Speaker
This week on Developer Voices, we're looking at the future as we often do, but not the future of programming per se. We're going to talk about how some very well established ideas in programming are being used to accelerate the bleeding edge of chemistry. My guest this week is Professor Christian Schuffmeister, and he's been working on a new implementation of Common Lisp to speed up the design of nanotechnology.
00:00:26
Speaker
Bits of this at times sound like science fiction, but it's very concrete. The plan is, next year, they're going to be building enzyme-sized machines that are being designed right now in a Jupiter notebook, in a REPL, in Lisp.
00:00:41
Speaker
You have to ask, why Lisp in this day and age? And we're going to get into that. Lisp still has a few features that more recent languages would struggle to emulate and might do well to emulate. To get there, we're going to have to go back and learn some of the most interesting chemistry I've heard about since school. So we better get cracking.

Christian Schaffmeister's Background

00:01:01
Speaker
I'm your host, Chris Jenkins. This is Developer Voices, and today's voice is Christian Schaffmeister.
00:01:21
Speaker
I'm joined today by Professor Christian Schaffmeister. How are you doing, Christian? Great, I'm great. Glad to have you here. You are our first professor. Oh. Which feels very formal. Normally, we have a lot of people on this show who have picked up a lot of knowledge along the way, whereas you're steeped in a particular field, which isn't our usual wheelhouse. It's chemistry. Yeah.
00:01:48
Speaker
So we're going to talk about the language you've been building, but we really have to start with why you needed a language because this is the closest I think we've got to science fiction on this show.

Building Molecular Machines: The Vision

00:02:01
Speaker
Yeah, so I've been developing something for the last 30 years where I want to be able to build machines on a molecular scale. So I have gone deep into organic chemistry and I now teach organic chemistry.
00:02:19
Speaker
And we've been developing these molecules that are programmable, that are made out of building blocks that are like rings. And when we snap them together, it's kind of like Lego. You can make molecules with different shapes. And they're complex. And so I have been developing software since I was 12. I was one of the kids who kind of hung out a radio shack and learned how to program in basic on a Tierra Sadie.
00:02:49
Speaker
And I knew that I wanted to use software to design these molecules and that it would be very complex. And so I've been working towards both of those things in parallel over the last like 30 years.
00:03:03
Speaker
This is very cool. With a different chip set, that's where I was basically starting off from Radio Shack. We have a similar background. Sure. But we're not talking nanotechnology, but these aren't nanobots.

Enzymes as Revolutionary Tools

00:03:20
Speaker
These are different kinds of chemical machines.
00:03:22
Speaker
We are talking about nanotechnology. The things that we build are on the nanometer scale. This is absolutely from the bottom up. We build things from molecules and build them up, and we know where every atom is in space. I'm actually a Feynman awardee. I think it was in
00:03:43
Speaker
2005, you'd have to look that up. But yeah, I have the Feynman Prize. And yeah, I've been funded for nanotechnology in the past. This is cool. So this is machines at the level of taking inspiration from enzymes, that kind of thing? Yeah, that's really where we're kind of focused because enzymes are the key technology that
00:04:07
Speaker
I believe human beings need to develop. Those are big molecules that can make other molecules, molecules that can make feedstocks for industry and fuel and take garbage and turn it into useful things and detoxify pollutants. All of that can be done by enzymes.
00:04:28
Speaker
We have a whole bunch of them that nature gives us, but they were evolved for the purposes of living things. There are so many chemical reactions that if we could create enzymes for them, we could solve every material problem that human beings have for all time to come. It's a fundamental technology that we need to develop. You really see this as completely changing the world.
00:04:54
Speaker
Absolutely. If we could make molecules that could build other molecules, then every problem, every physical material problem we have, we could solve. Somewhere between shrinking chemistry professors down to the nano scale. That would make us simpler. I don't know about that.
00:05:17
Speaker
The essence is like an enzyme is a molecule that's large. It's got a pocket and it organizes groups inside of it. So it acts like a little breadboard, like a little circuit. Another molecule will pop into that pocket and there's a push and pull of electrons. It is the smallest electronic circuit and it works for just a few picoseconds.
00:05:47
Speaker
A new product emerges and then another one goes in and that gets repeated over and over again without the enzyme being modified at all. It's the essence of life and we can't do that in a rational sense right now at all as well as nature has done. Nature has
00:06:11
Speaker
millions of these enzymes that it's evolved over countless millions of years that can accelerate many, many reactions.

Design Challenges in Custom Enzyme Creation

00:06:24
Speaker
I'm just trying to understand for the background. You're going to custom make an enzyme that takes some raw materials and builds a particular flurry of a molecule. Is it then that you'll have
00:06:41
Speaker
Is there a larger structure beyond that? Are you going to fill a vial with some enzymes that take raw components and put them at level one and gradually the next enzyme in the chain builds up something more complex until you've got something on the physical scale recognisable as a thing?
00:06:58
Speaker
Yeah, you've got it. Systems of these could build complex molecules. They can fix. Here's a crazy example. One of the theories of aging is that basically sugar, glucose, cross-links,
00:07:17
Speaker
proteins outside of our cells in what's called the extracellular matrix and stiffens up the tissue. That's known. That actually happens. The crossing is called glucosapane.
00:07:35
Speaker
specific little enzyme that could go in there and cleave those cross-links, you might be able to cure a lot of diseases of aging. I don't know if I'd turn back the clock, but it's one current hypothesis for how aging happens that basically glucose cross-linking proteins, it's called a malleured reaction. It's the basis of cooking. And that's what's basically happened to us over 60, 70, 80 years.
00:08:03
Speaker
Right. If you can make an enzyme that could cleave those cross-links specifically without doing anything else, you might be able to treat a lot of diseases of aging. What's the timeline on this? We can make these enzymes on a small scale right now. Is that right? We can't design good ones. It's difficult to make big molecules that can create pockets and wrap around smaller molecules.
00:08:33
Speaker
It's really difficult to do that. But if we had the recipe, we'd know we could physically manufacture them. Yeah, one recipe would be based

Existing Software Limitations

00:08:43
Speaker
on proteins. And there's a lot of people writing software to design proteins, like AlphaFold, you've heard of that. Rosetta is a software package that comes out of the University of Washington, David Baker's group. That has been developed over the last 30 years to design proteins.
00:09:01
Speaker
Proteins can do this kind of stuff. They are difficult to design with. My life's work is to come up with a more engineerable building block set. So proteins are sort of like bead necklaces or charm bracelets. They're a long string. They have little charms on them.
00:09:24
Speaker
And the charms, some of them are really greasy, some of them love water, and nature has put them together in a particular order, so they fold up into a ball with little pockets on them, and that's how they create the pockets that can do the work.
00:09:43
Speaker
That folding process is a grand challenge of science, predicting how proteins fold. Now, Rosetta and a lot of other people who have worked on using software to repack the insides of proteins have been tremendously successful, but creating the intricate inside of an enzyme
00:10:07
Speaker
is really difficult to get it exactly right because you have to control where groups are within a tenth of an enemy.

New Molecular Design Methodology

00:10:16
Speaker
Alpha fold has been a huge breakthrough in using deep learning to predict the folds of proteins that we don't know the three-dimensional shapes of.
00:10:30
Speaker
I've got a different approach. I thought, let's build building blocks. They're just easier to design within the first place. So instead of making a charm bracelet, a bead necklace, let's make little ladder, like rungs of ladders and then snap them together through two connections at a time. So you make a bunch of rings that are fused together. You make things that snap together more like a Lego than linking beads on a necklace.
00:11:00
Speaker
And then they're easier to design with and then write software to design them. So your idea is that you're going to have like a tray full of different building blocks and then figure out how to assemble them to make interesting M times. And this is where we're getting into, is it pronounced can do, which is a great acronym for computer aided design nano. I forgotten the O.
00:11:28
Speaker
computer-aided nanostructure design and optimization. Right. So we're really on the level of CAD software for molecules. Yeah, yeah. I started, I've written this like four times and I started out writing a CAD graphical user interface sort of software. But every time I got something, my students came up with chemistry that it couldn't handle and it required a redesign.
00:11:59
Speaker
And then I settled on, I'm just going to write a language and build a user interface on top of that. Oh, right. Now here's where we get into language design. So at what point, at what point did you say this is going, I mean, what point did you say, I'm not going to be able to do it with the existing software that's out there. I need to build something custom. I think I was writing it in small talk at that time.
00:12:27
Speaker
And I had a really nice user interface. And my group figured out how to put what we call functional groups, sort of side chains off of each building block. And the way I was building the molecules just wasn't going to work with that. And so I just scrapped it and started over again. I spent a lot of time writing. Chemistry requires performance.
00:12:57
Speaker
You're always doing things on lots of atoms that require writing loops that do intricate things and you need it to be very fast. So most of my stuff was written in C++.

Why Common Lisp?

00:13:13
Speaker
And then I would hook that into other languages. For the longest time, I was using Python. But the interop between Python and C++ and managing lifetimes of objects became very troublesome. So I came up with another way. And that's what current can do and the common-list class that it's built on. That's where that came from. Now, this is an unusual route, particularly these days, I think, to go from
00:13:42
Speaker
to go from small talk to Common Lisp. What year are we talking when you did this? Probably about 2005, 2006. That still counts as relatively recent in the grand scheme of things. Why Common Lisp? I was looking around at the time for a language that would allow exploratory programming.
00:14:06
Speaker
I knew I needed automatic memory management. I needed performance. That was the key thing, performance. I had this core of chemistry code that was written in C++. It's about a quarter of a million lines of code at that point.
00:14:28
Speaker
And I had it all hooked into Python, was running into a lot of trouble. This is back in the Python, before Python 3. And was running into a lot of trouble maintaining that.
00:14:43
Speaker
A friend of mine was at NASA and he'd worked a lot with Common Lisp and said, you should check out this language. I was using XML a lot for serializing data at the time, so I started getting into this idea of nested scope in the language.
00:15:04
Speaker
I started implementing a Lisp in my C++ and found that it worked very well with the chemistry ideas that I was trying to implement, so I moved on from there. I don't want to monopolize the conversation, so I'm going to pause. No, it's your job to monopolize the conversation.
00:15:22
Speaker
So what is it about Lisp that lends itself to chemistry, though? I can't quite see that. So it's really easy to express graphs and trees and linear sequences in Lisp with parentheses. I don't know, it just flowed really well. So I
00:15:46
Speaker
I had all the C++ code. I was writing a Lisp interpreter and started to write, basically implementing my own Lisp. And I didn't go very far, and I realized it was crazy to try and implement my own language, because I developed in a lot of languages. I know how difficult that is to get that right. So I just looked at what kind of Lisp implementations were out there.
00:16:11
Speaker
And there's a scheme, there's Common Lisp. You know, Emacs is based on Lisp. There's a couple of them out there. But I wanted one that was kind of full-featured, battle-tested, had been used to implement large programming systems. And, you know, Common Lisp is used in the
00:16:33
Speaker
Google Flights, the engine behind that is all implemented in Common Lisp. And so I thought, you know, Scheme's got this specification that's like 25 pages. Common Lisp has this specification that's like four inches thick. I'll go with a four inch thick specification. Not the decision I would have made just on those metrics. I do have a tendency of always taking the hard road on things. You don't become a professor if you don't like reading, right?
00:17:03
Speaker
I guess, yeah, yeah.

Integrating Technologies for Performance

00:17:06
Speaker
So there are several implementations of Common Lisp, and one of them is implemented in C. This is called Embedded Common Lisp, or ECL. ECL, yeah. Much of that is written in Common Lisp itself. It's self-hosting, and it has a lot of C core to do the lower level stuff.
00:17:32
Speaker
Right. So I just took their, it's a, that's a, um, a GPL, uh, software package. It's got a GPL license on it. So, or an LG PL license. So I took the common list code from ECL and just started writing my Lisp interpreter so that it would execute that.
00:17:54
Speaker
And I just kept going and executing more and more of it. And my Lisp turned into a common Lisp implementation. Oh, I see. OK. I also have good C++ inter-operation. So I was doing a lot of C++ template programming. And I implemented something like Boost Python to integrate C++ and the Lisp.
00:18:20
Speaker
And then I integrated the LLVM compiler library, exposed that to the lisp, and then I started writing the backend to generate LLVM IR using the LLVM C++ API. And that's how it all grew. Why that step? Because I can see why you need C++ access. Why go into the LLVM part?
00:18:45
Speaker
performance. With LLVM, I can get native code compilation. And without that, I would always have an interpreter. Right. So you said kind of cannibalized, not cannibalized, but stood on the shoulders of an existing Common Lisp implementation. Yes. But we, the developer of ECL, Daniel, is
00:19:13
Speaker
We hang out on IRC and we fix bugs in each other's systems because we share a lot of code base. It's been great working with that community and the larger Common Lisp community as well.
00:19:35
Speaker
What circumstances do you think this is the right choice? I mean, if someone else was looking at, if someone else is in your position with a large C++ code base, when would taking your path be a sensible one? Well, I would hook it into a class. I think it's a great tool for exposing C++ code in a high level language that has dynamic memory and a bunch of other features.
00:20:06
Speaker
I've hooked in DNA sequencing analysis libraries like Seacan, and it's very easy to integrate C++ code with Clasp, and that gives you access to all the common-list libraries. I would have thought that, I mean, anytime I think of programming in the science world, I think of Python. Are you losing something by moving away from Python in this?
00:20:35
Speaker
Yeah, you know, Common Lisp, if you take Python code and you take all the functions and you just take the parenthesis for the function call, you know, foo, open parenthesis, arguments, close parenthesis, take the name of the function, move it into the, after the first parenthesis, remove the commas, and that's basically Lisp. You know, it's not that big a change. Then you have all these functions that give you, you know, string handling,
00:21:06
Speaker
file handling, all sorts of stuff like that. And you have a full language there. I mean, it's really not that different from working in Python. It's just that it compiles to native code and it's
00:21:24
Speaker
Common Lisp was developed, and it's a standard language, so it has a standard that was developed back in the 80s, and it's a forever language. The code that I wrote 10 years ago in Lisp I'm using now, and libraries that I use, some of them have been written
00:21:43
Speaker
Over the last couple of decades and there's a large library base of common list that works because the language doesn't need to constantly change. There's a standard and everyone writes to the standard there's multiple implementations so everyone writes code that works on.
00:22:04
Speaker
a large number of standards. There's a philosophy of writing things properly rather than writing them just so that they work or they work now. I gather a lot of software in the science world suffers from that problem. Understandably, because most people in the science world aren't primarily programmers.
00:22:29
Speaker
They're not, and there's a tendency to write stuff that does what you need to get the paper out, and then it rots, then it rots and dies.
00:22:39
Speaker
But I've got to ask you that question again, because you take, okay, so there's an argument that you take any programming language and strip the syntax out and you end up with Lisp, because Lisp is the programming language with almost no syntax. But Python isn't just the syntax and the core library functions. It's this whole ecosystem of science packages, particularly mathematics. Are you missing that in your common Lisp world?
00:23:05
Speaker
You know, some, absolutely. There is a lot of momentum behind it. But if I, you know, if I started writing this in Python, I would really bog down right now with, you know, you can't write loops in Python and expect them to run quickly. And when you write, like, I work with a lot of three-dimensional data.
00:23:29
Speaker
You know, like I've got a molecule, it's three dimensional XYZ coordinates, and I need to put water molecules around it. So I have to do a loop across X, I have to do a loop, wrapped in a loop across Y, wrapped in a loop across C. I've got like three nested loops. And then inside of that, I've got it doing something really complicated, like figuring out if a water molecule is overlapping any atoms in my molecule. If you write that in Python, it's gonna take an hour to run.
00:23:58
Speaker
in Common Lisp with a little bit of C++ assistance, it happens in a fraction of a second. I can't develop performance.

The Performance Imperative

00:24:10
Speaker
Time is the most valuable thing to me. So I'll invest a lot of time in developing the software if it runs quickly when I need it to. And I'm now running things on large clusters, distributed computing across large clusters that would take years of
00:24:28
Speaker
time when it's run on a single CPU. Yeah, I can say that like Python, it's more than fast enough for the average web server. But when you're folding proteins in 3D space, yeah, I can see that every nanosecond counts, right? I've programmed like since I was 12, so more than 40 years. And all the Python that I wrote back in the 90s is
00:24:54
Speaker
gone, dead. When they changed to Python 3, I didn't upgrade all those libraries, and it's now gone. So all of that was kind of taken away from me. I don't want that to happen again. So I'm only developing now in forever languages, languages that are going to be around for until 20, 30, 40, 50, 100 years from now. It's one of the few languages where I could predict the syntax won't change much between now and then.
00:25:22
Speaker
Yeah, and I have the rare experience of writing software that's been used for more than 30 years. Before I started graduate school, I wrote a program called LEAP, which is kind of the front end for AMBER, which is one of the large academic molecular dynamics packages. It simulates the motions of proteins in DNA in research.
00:25:46
Speaker
It comes out of the University of California in San Francisco. I was given the job before I started school to write a front-end for it. It would make it easy for researchers to load in their proteins and set up the calculations to run on this, at the time it was written in Fortran software.
00:26:08
Speaker
Now, it's got the fastest GPU implementation, but my software, Leap, is still being used today. Probably some significant percentage of the world's biomolecular simulations go through Leap.
00:26:23
Speaker
every day and have for the last 30 years. That thing was written in C and they've tried to replace it, like the Amber community has tried to replace it, I think twice and have not succeeded because it does an essential difficult job really well. The downside is it hasn't been improved very much in 30 years because it's difficult for people to get into the way it was written.
00:26:54
Speaker
Well, this, so that raises a question of collaboration. You're writing this software in Common Lisp is again, to an audience that's probably, if they know programming at all, probably knows Python.

Development Tools in Lisp

00:27:07
Speaker
I'm guessing team of technically astute chemists. How are you, how are you finding the, the user experience and how are you making it usable for people?
00:27:22
Speaker
There's really powerful tooling based on Common Lisp tooling software. Let me rephrase that. There's a software package in Common Lisp called Slime, which runs in Emacs.
00:27:41
Speaker
unless you connect into a running Common Lisp instance and develop code. It's got an interactive debugger. It's got autocompletion. It has everything you want. It's a really beautiful programming environment, a sort of integrated development environment. It's all text-based working in Emacs.
00:28:03
Speaker
So, we fully support SLIME. I use it every day. I've got it open right now. It's one of the most wonderful interactive exploratory programming experiences
00:28:16
Speaker
I think you'll find. Now, I haven't used a lot of modern IDEs for C++ or Python, so maybe I'm talking out of my hat there, but it is a really fluid programming experience, so we have that. We've also developed a Jupyter kernel that runs class or can-do.
00:28:39
Speaker
And so it's got widgets for doing displaying molecules and graphs and things like that. So you can open a Jupyter notebook and type in some common lisp and see your molecule and that kind of thing. Yeah, exactly. My usual development environment is to have I've got Candu running that
00:29:02
Speaker
can do running. I've got a Jupyter Notebook instance talking to it, and I've also got Slime running in Emacs also talking to it. I'm developing code in Slime. I'm seeing the effects, the output in the JupyterLab environment, and that's how I get my work done. If that plays out the way
00:29:25
Speaker
I'm seeing in my head, I think a lot of modern IDEs and more modern languages would struggle to match that. If you've got the mixture of the interactive programming experience and the live visualization, there aren't many languages that do that well, assuming yours does. That's a good point. I really hadn't considered it. I was really inspired by the
00:29:55
Speaker
Sorry, what is that new Apple language that... Swift? Yeah, Swift. So when they first demoed Swift, they had this sort of interactive programming environment. I love that demo. I'm really inspired by the small talk interactive programming environment. Slime is really great.
00:30:14
Speaker
I want something where I can be talking to the programming environment, add functions, change everything about it. I want to be able to change class. I need to be able to change class definitions, functions, add generic functions that will add new functionality. I want to do all that stuff interactively, and that's what I have with this system.
00:30:42
Speaker
Yeah. Yeah. I think to that I can say, I think Lisp is the only language where I've genuinely felt like I'm having a conversation with the computer, you know, that dynamic interactive experience. Yeah. That's, that's an interesting way to put it. I hadn't really considered it, but yeah, I would agree. It is, it is really, I really enjoy programming in it. Um,
00:31:10
Speaker
That's cool. This is a very nerdy question, but I'm allowed to ask those. I have to ask, so you're looking at doing an interactive lisp with C for the fast bits of bindings. Did you ever consider just doing an Emacs lisp?
00:31:28
Speaker
When I was first getting into Lisp, I was a little confused about what was what. And I thought that that would be an option. But it doesn't have the native compilation. It's a reference implementation of a language. And so I decided, no, I'm going with a standard. Fair enough.
00:31:52
Speaker
Okay. In that case, let's talk about the, um, this performance bridge. So I'm assuming you've got C plus plus where it needs to be really, really fast. And you've, I'm assuming you're as far as possible. You're in list blend for the convenience, interactivity, high level preferred way of programming.

Optimization Techniques

00:32:12
Speaker
Tell me about the bridge and when you know, it's time to put something on this side or that side of the bridge.
00:32:18
Speaker
Yeah, now I usually start, I don't know, I've got to really, at this point, I've got an intuitive sense of when I really need speed and when I can get away with that. I do all the sort of low level C++ work in implementing the Common Lisp. So I do a lot of C++ development, a lot of C++ template programming,
00:32:47
Speaker
And I think I'm pretty good at it now.
00:32:51
Speaker
We implemented LLVM as the back end so that we can generate native code. And that code runs pretty well, but I still can't implement something in Common Lisp and have it run as efficiently as it would if I implemented in C++. Because Common Lisp does try and make everything safe, and everything's wrapped in our, most of the objects are wrapped in wrappers. So that slows things down.
00:33:21
Speaker
We've now implemented a bytecode compiler in CLASP to sort of deal with the slow compilation of LLVM, which is a great library, by the way. My train of thought is just derailed. Could you reiterate the question?
00:33:45
Speaker
you've got Lisp in programmer space where it's nice and C in computer space where it's fast. Which parts of that go on which side of the line? Profiling. Profiling. Profiling. So because we generate LLVM IR, we can compile everything to LLVM IR down to native code and it all turns into object files and
00:34:09
Speaker
you know, basic Unix, ELF, mock sort of stuff. So we can profile the common list against the C++ perfectly using perf or using dtrace on the Mac. And we generate flame charts and I can see where time is being spent in the code. So it's profiling. Right. How much work was it to get that working? A lot, because it's really fussy.
00:34:40
Speaker
especially the jitted code, the just-in-time compiled stuff that we're generating all the time, to get the names accessible for the profiling tools had to futz around with a lot. They basically had to solve the same problem for JavaScript on browsers, so we used those mechanisms once I learned about them.
00:35:04
Speaker
Um, but yeah, we can profile everything on an even, uh, level. And then, uh, depending on where time is being spent, I will move stuff into C plus plus if I need to. Okay. But can you do that all while your same processes is running? Yeah. So you can take a function that's running too slowly and interactively update it with a C plus plus definition or interactively compile it to C plus plus on demand or.
00:35:33
Speaker
No, no. Changing the C++ code means rebuild of the system. Okay. That takes about five minutes and then restart everything and reload everything. Um, yeah. Okay. So this is another reason to keep certain things in Lisp space for the interactive reloading without restarting thing. Yeah. Which really puts me in mind of Crash Bandicoot. Do you remember that video game? Yeah. Yeah.
00:36:02
Speaker
Yeah, that was a good sales pitch for Lisp back in the day. Yeah, so I do most of my development in Lisp. And it's only when profiling shows me that I've got a problem that I move it into C++ or rejigger it to get things into C++. OK, then maybe we should talk a bit more about what's good in Lisp space.

The Role of Macros in Lisp

00:36:25
Speaker
So what are the poster children features of Lisp macros? Are you using the macro system much?
00:36:32
Speaker
Sam, you have to use that carefully. It's a powerful, pointy tool. But yeah, it lets you write your own
00:36:45
Speaker
flavors, language. Yeah, we do use it. The macro system is really the basis of Lisp. All the stuff that makes it really powerful uses the macro system. So in that sense, we're using it all the time. But writing my own macros, that's pretty, I don't do that that often. Okay. What about the reader system?
00:37:15
Speaker
It's another powerful tool. I have a recent use case for that. We have all of our specialized arrays, so for floats and doubles and integers, where they're just representing compact data in memory.
00:37:37
Speaker
Okay. I need to serialize large, complex data structures to disk and then load them back in for running calculations. We could talk about the save, lisp, and die thing. But saving these large vectors, I'm using the lisp printer and reader
00:37:58
Speaker
to dump these complex data structures to disk and then load them back in. And that is fairly slow. It's like Pickle or it's like using JSON, but you can have internal references in the data structure. So you can build up secular graphs.
00:38:15
Speaker
Yes, and that's an absolute necessity for what we have because molecules are graphs. Like it or not, it's cyclical. Yeah, and so it handles all that really well. But writing out the arrays was taking a lot of time, so what I did is I just
00:38:36
Speaker
I turned this 8-bit data into 6-bit data that's human-readable, just using 6-bit characters. And I write out this stream of characters to represent the compact array. And I wrote a reader macro that, when it detects the reader macro characters, just reads in that 6-bit data and turns it into a 8-bit data structure in memory.
00:39:05
Speaker
and took my load times for these things from 10 minutes down to 30 seconds. Which must come into play when you're writing C++ and having to reboot the world. Do you have a perfect between session thing?
00:39:25
Speaker
When you're working with Lisp, you end up with this kind of working memory of where you're at. Can you flush that entire thing reliably to disk, reboot and bring it back and be exactly where you were? Yes.
00:39:38
Speaker
Yeah. So one of the really remarkable features of lisp is this idea of save lisp and die, where you just checkpoint all of them. You do all your work, you get your memory all set up to, you load all your files, you get everything set up to do something. And then you say, okay, save all this memory.
00:40:00
Speaker
And then later on, start up the Common Lisp system loading that memory and be right back where you left off. Yeah, so I implemented that feature in CLASP and I can load 100 megabytes of complex data structures to do a complex design calculation and then save the thing to disk and then start that up again in less than four seconds.
00:40:27
Speaker
That's a rare feature these days, right? Because most programming languages, you can't checkpoint the state of the program. How much work was it to implement that? You have to have a really good understanding of how the memory is laid out and with how our garbage collection works.

Memory Management and Concurrency

00:40:48
Speaker
So we support precise garbage collection. So I know where every pointer is in memory.
00:40:54
Speaker
And so it's just a matter of keeping track of all that stuff, writing it all into a block of memory, dumping that to disk, and then loading that back into memory, and then feeding it back to the garbage collector, which will put it wherever it wants to in memory, and then fixing up all those internal pointers. And the memory survives that round trip.
00:41:20
Speaker
It's pretty complicated and fussy, but it's working. What else can I say about it? Is it reliable? It's working perfectly. If one pointer was out of place, it would crash the system within moments.
00:41:38
Speaker
I have a memory test tool that will exhaustively check that every pointer is pointing to a valid object, and that gives me a lot of confidence that everything's fine. What we can't carry across that are it's got to shut down all threads, all but the main thread, and it has to close all file handles. But that's not as big a problem as you'd think. Why not? Because I would think.
00:42:07
Speaker
Well, for the stuff that I need to do, it's not. I can generate a memory image that we start up the JupyterLab environment where we load all the packages that we need for JupyterLab to run and then start up JupyterLab and it, you know, first time to plot is less than two seconds. Okay. Okay. Fair enough. That makes me think, you mentioned the word threads.
00:42:38
Speaker
And I can see how you've got a requirement for parallel processing and threading. How does that fit into a hybrid Lisp C++ model?
00:42:49
Speaker
How does that fit? Common Lisp has a standard. It's also got several libraries that have become de facto standards, like Bordeaux Threads is the de facto standard for creating threads, taking them down, communicating, setting up locks between them.
00:43:11
Speaker
And we use that to do multi-threaded programming. Built on top of that are convenience libraries like L-Parallels, where you can just do a parallel map. Common Lisp has lists and lists comprehension, so I can map over a list, blah. Classic tongue-coaster, Lisp and Lisp.
00:43:35
Speaker
L-parallel, you parallel map over a list. So I can just give it a list of work and it will allocate that to multiple threads. And I don't have to think about it. That's what I do. Again, the advantage of working mostly in a higher level language. Yeah, yeah. And when I'm doing that, I'm doing these design calculations. I've got a 28 core machine and it's using 26 of them flat out.
00:44:03
Speaker
I get 80-90% utilization of all the CPUs when I'm doing this design calculations. That must be speeding up the high-level chemistry task a hell of a lot. Tremendously. Nice. Okay, so zooming back out then, where is this tool going to take the research?

Towards Therapeutics and Environmental Solutions

00:44:25
Speaker
We are starting to design molecules now to be therapeutics and catalysts. We're doing that now. That's where we needed to go. So we're essentially there. It has taken a long time and I really just got the high level design calculations working in the last couple of months. Okay. And so does that put you in a place where
00:44:55
Speaker
I mean, the people using this, are they saying to themselves, I think this would be a great model. I think this would be a great end zone to build. I think it will have these pieces. Are they working like a Lego designer or are they Monte Carlo-ing a number of things and testing which one works or? The people who are using this are basically me right now. Okay. Okay. And what are we doing? Let me talk to them. Okay.
00:45:24
Speaker
What we're doing is a lot of catalysts have metals at their heart, like metal atoms, like rhodium and palladium and platinum. And so you say, okay, I got a palladium atom and I want to have groups that attach to it, hold it in place, and leave a space near it that another molecule can come in and interact with that metal, touch it.
00:45:47
Speaker
So you say, I need a group here, I need a group here and here and here. And then you say, OK, find me a scaffold that can hold those groups in place. And the scaffold is built out of our building blocks. And so the software goes and tries lots of different scaffolds. Monte Carlo search through design space and tries to find the ones that can hold the groups in the right constellation.
00:46:12
Speaker
Is this into the realms of things like, I don't know, genetic algorithms and that kind of stuff? You can use those as search algorithms. I'm using Monte Carlo. It's the simplest, most powerful, elegant algorithm to do this. You have a scoring function, you generate possibilities, and you just crank on them as on as many CPUs as you can get your hands on.
00:46:40
Speaker
And possibly to finish off, it's very hard to predict the future. It's very hard to predict time scales. But at what point will we see a class designed molecule being made in a lab? I'm hoping in the next year. In the next year? Yeah, not hoping. I'm doing it in the next year. OK. So the other thing is,
00:47:07
Speaker
I'm funded by the Department of Defense and I've got a company that's developing this technology and we've developed a way to synthesize these molecules. We've made thousands of them in the last three years. We can make these molecules very quickly and easily. It takes a day to make any reasonably sized molecules. It takes one or two days to make them and we're doing this on robots.
00:47:35
Speaker
The problem now is which ones do we make? At my company, we're making millions, we're putting them together into millions of different configurations, essentially at random and then throwing them at the wall to see what sticks. The software, those are our hands, we can make those things. The software is meant to be our eyes so that we could see and predict what to make. So I'm bringing that online now.
00:48:03
Speaker
Okay. The search space is potentially so vast. That's the problem we're solving. Vast. In theory, we have thousands of building blocks. In practice right now, we have 40. You put four of those building blocks together in a sequence. That's a very small thing, but it could be a drug.
00:48:25
Speaker
So then you've got 40 to the power of four different shapes that you can make of just that small configuration. You can either try and make some subset of them and throw them at the wall and see what sticks, or you can try and design them in software. That's what CanDo is for, is to design them. I don't know how well it's going to work. It's an experiment.
00:48:51
Speaker
Yeah, well, you're in the research business. Someone else can commercialize it. And eventually, what will we have? Fields of Shaftmeister wheat? Or try to imagine the future that you're heading towards. The first one is new therapeutics, medicine, treat disease, new diagnostics to recognize emerging biological threats.
00:49:17
Speaker
catalysts that can accelerate reactions and turn cheap starting materials into valuable products. I have to ask then, if you've got something like this biological machine that manufactures drugs or manufactures useful things out of raw materials, if we see it, I'm just trying to think.
00:49:42
Speaker
like war zones, for instance, aid, outbreaks of viruses. Will these things be able to manufacture things in the field if we find them? This is kind of long term science fiction. We're stretching out to the end of the podcast, so give me some insight. This is potentially, yes, we can rapidly build these molecules and if we can rapidly design them,
00:50:10
Speaker
Then yes, you could build things in the field. There's a lot of machinery that goes into this. It's not the first thing I'm going after. What I'm really focused on is a basic capability. Can we design a molecule that does what we want?
00:50:27
Speaker
when we want to make new therapeutics, new molecules that can wrap around proteins and act as diagnostics, or accelerate reactions, or create channels that can purify single molecules out of mixtures. Could we pull lithium out of seawater or uranium out of seawater? You could imagine membranes that could just selectively pass uranium
00:50:52
Speaker
and keep all the sodium and lithium and potassium in seawater out and then you could pull metals out of seawater and it would be a lot less environmentally damaging than mining.
00:51:05
Speaker
Yeah, plus presumably like dealing with pollution as well. Exactly. When we can build things on the molecular scale like that with intent, with rational design, we can solve a lot of our problems. That's where I'm working towards. Yeah, I can totally see the potential and I'm quite pleased that there's lisp somewhere in that future.
00:51:31
Speaker
Yeah, it's a fun programming environment to develop this stuff in. Well, I hope it goes well. Christian, thanks very much for telling us all about it. Chris, thank you for taking the time in. Cheers.
00:51:45
Speaker
Thank you, Christian. Now, you'll have only noticed this if you've been watching this episode on YouTube, but throughout that conversation, Christian was drinking coffee out of a lab measuring beaker, which is brilliant, very on brand for a chemist. If nano machines turn out to be the future of Lisp, remember you heard it here first. So please take a moment to like and subscribe, share with a friend, rate, post it to the social, all those good things. I appreciate the feedback and the support.
00:52:14
Speaker
It's also worth saying that this particular episode only happened because Christian got in touch with me and said, I've got an interesting programming topic that you might like. So if you have an interesting programming topic that I might like, my contact details are in the show notes as always. With that, I will leave you for now. I've been your host, Chris Jenkins. This has been Developer Voices with Christian Schaffmeister. Thanks for listening.