This is a conversation between me and a machine learning programmer. I find machine learning fascinating, because for the longest time, humans have been the most clever creatures on the planet. It seems we’ve finally found a suitable partner in computers when it comes to numerical calculations, and machine learning can help humans answer questions they could spend lifetimes on, or never find out at all. Enjoy.
Tell us a little about yourself.
My name is Daniel Crockett, and I’m currently a grad student at UT getting a masters in computer science. Originally for my undergrad, I did civil engineering. I then lived in Myanmar for two years; I worked for an American investment firm that was a startup in Myanmar. Then, I decided that I really liked computer science and I wanted to go back to school. So I came back to the United States and enrolled, well, applied to UT and got in, and now I’m a master’s student there. I especially wanted to go back to computer science because of machine learning, I thought it was pretty interesting, the stuff you can do with it, it’s kinda the field that’s really popular right now, a lot of advancements are being made…. So, that’s how I ended up in Machine Learning.
How long have you studied computer science?
So, unofficially, 5 or 6 years…. Even while I was in undergrad, I…had an internship where I basically modeled wetlands*, which a lot computer science with some engineering mixed in. But then I’ve been going to grad school for one year, so officially one year. Most of my knowledge of computer science, especially the stuff that’s not machine learning came from studying on my own.
*Mr. Crockett is talking about using computer science to take advantage of the disproportionate ecological services partially inundated ecosystems can provide.
What is machine learning?
Essentially machine learning is using statistics to… create a program that will learn or change based on the data that you’re feeding into it. The scientific explanation is [that] you learn from an experience with respect to some task and some performance measure. The very technical definition of machine learning… is when the program learns from experience (e) with respect to some class of tasks (t) and performance (p) if it’s performance … (t) as measured by (p) improves with … (e). Which is basically to say, you give it a lot of data (because that’s what experiences are, like even for humans, when we talk about experiences, it’s just data that our brains are collecting), and then from that data, we learn, and we improve at certain things.
For example, reading. You have a lot of data fed into you, like how different letters correspond to different sounds, and you retain that. You can also use that data to make assumptions, like, [how would a word you’ve never seen before is pronounced]. It’s the same thing for computers. You feed them data, and they improve with whatever task you are wanting them to do with that data.
And that explanation comes from Tom Mitchell. It’s the popular one that I see pop up a lot, when you’re trying to talk about machine learning starting from the basics.
Tell us about an example of software that uses machine learning concepts.
Here’s a very simple machine learning task… you’re trying to predict property prices based on their size. And obviously that’s pretty easy, it’ll probably be a linear [relationship]. But you just get a bunch of property and put it’s size on the X axis and it’s price on the Y axis, right, and you get all your data points and you draw a [best fit] line through them. That’s technically a machine learning algorithm. It’s using all the different data on the size and price to create a correlation and then, from that, it can predict, given the size, what will the price be. So that’s like a very simple example. Curve fitting is a very simple machine learning algorithm, and it’s probably the easiest thing to understand.
What are some challenges a machine learning programmer may encounter?
Machine learning is not a guaranteed [solution]. You don’t know what answer you want to have before you start. A lot of algorithms in machine learning aren’t hard-set in what you should do, there are little numbers in your algorithm that are not set in stone…
A lot of times if you don’t know exactly what you’re looking for, you can’t know if you’ve found it and if you’ve found it correctly. Troubleshooting can be very difficult in machine learning. You might get a really weird answer, but that might be because your algorithm is correct, and you have no idea what the answer really should be… a lot of times I’m not 100% sure what I’ve done is correct, because I’ll get an answer, and it’ll be a strange answer, but there’s no sense of what the answer should necessarily be.
Like, for graphics, if you’ve messed up, your stuff will look terrible, right?… For machine learning, you may get a really weird answer, but it might be the right answer, because all you have is data, and you don’t necessarily know what the implications of that data are until you’ve messed around with it.
That also makes it really exciting though! It’s not like, oh, its just another day at the lab, I know exactly how this is going to turn out. You really have no idea. It’s very easy to get to a place in machine learning where you don’t know what the answer should be.
What, in your experience, is the most interesting application of machine learning?
So I really liked natural language processing, because it’s essentially an attempt to make computers understand human language. And there’s a lot of ambiguity in human language. One kind of project that I worked on at the end of last semester was word-sense. Everyone knows how words can have multiple meanings. “Bark” can be the sound a dog makes or it can mean stuff on a tree. So that’s like, two completely different meanings (word-senses). But there’s also [words] like “book”, where, it can mean a physical book, it can be an eBook. It’s not necessarily something that’s bound and has paper pages. (Quick note from Buskmiller- Mr. Crockett doesn’t mention why this is an important distinction, but the point is that there might be much more nuanced context in a scene that reveals whether someone is reading a paper book or an eBook, while the word “bark” can be found out much quicker from it’s position in a sentence.)
So what I was trying to do, along with a partner, is find the word-senses of different words. The way you find out the word-senses is you look at the words surrounding it. So this had been done before where [the computer] looked at the words surrounding the [word in question]. We tried to find the word-senses for the surrounding words to find the word-sense we were looking for. And it turns out that it doesn’t really improve it, and you get diminishing returns.
(A similar idea was put into place to write an eighth Harry Potter book with machine learning. My favorite quote is “To Harry, Ron was a loud, slow, and soft bird. Harry did not like to think about birds.”)
What are some misconceptions about machine learning that may have risen from pop culture or science fiction?
I think a lot of people when they think of machine learning, they think of robots brains. And, unfortunately, we’re pretty far away from self-aware robots or anything you see in science-fiction. So I guess that would be a misconception. A lot of people just think instantly when they hear AI they go to robots. But there are a lot of things that are really interesting that are not robots in [machine learning]. Like natural language processing, like predicting stock values. People who like to make money find that VERY interesting.
Sometimes fantasy stories are frightening because very intelligent computers turn against their human companions (2001: A Space Odyssey, Ex Machina). Is this reasonable? Why or why not?
It seems like a pretty easy problem to avoid if you restrict access to what computers have. So you have your computer, and if you don’t put arms and legs on it, it’s not gonna go anywhere. If you restrict it’s access to the internet, it’s not going to figure out how to connect. There are physical physical limitations you can impose… even if self-aware AI is even possible, it seems like it would be fairly easy to impose physical limitations.
Many thanks to Mr. Crockett for this interview! If you know someone who has experience in an interesting branch of computer science, email me at email@example.com.