Placeholder Image

字幕列表 影片播放

  • Today we're going to do some more live coding, and we're going to talk

  • about something which is quite close to my heart, because I wrote some of the

  • early papers on it many years ago, and that's something called functional

  • parsing, or combinator parsing. Before we get started with actual live coding, I

  • want to begin with the question. And the question is: what actually is a parser?

  • So for me a parser is a program which takes a string of characters as its input, and

  • as its output produces some form of tree. And the idea is that the tree makes

  • explicit the structure in the input string. So that's a bit of a mouthful, so

  • let's have a simple example to explain what's going on with this. So what we've

  • got here is we've got a string of five characters -- 2 plus 3 times 4 -- and

  • that's our input string. But when we look at this we know that that's not just a

  • random sequence of five characters, it's actually got some structure to it. So in

  • particular, we've got three numbers here -- 2, 3 and 4 -- and we've got two

  • arithmetic operators -- we've got the plus and the times -- and one of the things we

  • learn at school is that the times happens before the plus. So you really do 3 times 4

  • here, and then you add the 2 on at the end. So that's our input string.

  • So what a parser does, is it takes a string of characters -- like that -- and it

  • tries to recognize the structure in the form of a tree. And the tree we would get

  • from a parser is something like this here. And you see we've got some leaves

  • in the tree -- the leaves are the numbers 2, 3 and 4 -- and the nodes in the

  • tree are the two arithmetic operators -- that's the plus symbol and the times

  • symbol -- and you can see the structure of the tree reflects the fact that we do

  • the multiplication. First we multiply 3 by 4, and then we do the

  • addition second. So that's the basic idea of what a parser is: it takes as input a

  • string of characters, and as output it produces a tree. So it's really a

  • function -- it's taking as input a string and as output as a tree. So we've seen

  • what a parser is. It's a function that takes a string as an input and produces

  • a tree as an output. So now we want to think how can we actually implement this:

  • how can we implement the idea of a parser? And I'm going to do this in Haskell

  • today, but it doesn't matter if you don't know anything about Haskell because

  • I'll be explaining everything as I go along. And actually, nothing I'm going

  • to show you today is specific to a language like Haskell.

  • You can do it in any general-purpose programming language. So if you do a web

  • search for a 'combinator parsing' or 'functional parsing', and then whichever

  • language you're interested in, you'll find the same kind of stuff which I'm

  • going to show you today. So it's not specific to the Haskell setting. So

  • what we're going to do then first, is we're going to define precisely, in our

  • programming language, what it means to be a parser. We're going to define a new

  • type called Parser, and it's simply going to be a function which takes a string as

  • an input and produces a tree as an output. So it captures the very simple

  • idea of what we are wanting to do -- a parser is a function that takes a string

  • as an input, and gives the tree as an output, and the arrow here just means

  • that we have a function from one thing into the other. So this captures the

  • basic idea of what a parser is. But unfortunately, it's not sufficient to

  • program with. We need to refine this a little bit to actually write programs

  • with this. So the first little refinement we're going to make is that a parser

  • might not consume all of its input. So for example, if we're trying to parse a

  • number, like 2, we maybe we find the 2, and then maybe we've got some more stuff

  • left in the input string that we need to parse. And if we want to chain parsers

  • together, we're going to need to have access to the remaining input string

  • that we didn't manage to consume. So the first little refinement that we'll make

  • is rather than just returning a single tree, we're actually going to return a

  • pair now. We're going to return two things -- we're going to return a tree as

  • before, and we're also going to return the unconsumed part of the input string.

  • So that's the first refinement we've made. The second refinement we're going

  • to make is that a parser may not always succeed. We may be trying to parse a

  • number, and we don't find the number, we find something else. So we need to have a

  • way of representing that a parser can fail. So the way we're going to do that, is

  • we're actually going to make a parser return a list of results, rather than a

  • single result. So lists in Haskell are denoted using square brackets. So this

  • simply means that rather than returning one pair of results, we could return zero,

  • or one, or two, or as many as we like. And the idea is going to be if our parser can't

  • parse, so it doesn't succeed, we'll return an empty list of results. And if it does

  • succeed, we'll return a list with one pair -- we'll return a tree,

  • that represents the structure of the input string, and we'll return the

  • unconsumed part of the input. But because we're working with lists here,

  • we could actually be more flexible. We could return two, three, or four, or five,

  • or as many as we like parses. And this is actually quite a good flexibility,

  • because for some languages the input string may be ambiguous -- maybe we're

  • trying to parse English, and English sentences don't always have one parse,

  • they can be interpreted in many ways. So this type here is giving us a

  • flexibility to return many results if we wanted. We're not actually going to use

  • that flexibility today, but it's nice to actually have it. So I haven't told you

  • what the tree data type is here. And that's because I'm actually going to get

  • rid of that now. Sometimes you may want to return a number, or a program, or some

  • kind of other structure. So we're going to replace that specific type of trees

  • by some arbitrary type 'a'. And I'll make this a parameter of my type. This is our

  • final type, which we're going to work with today. What we're saying is that a

  • parser whose results have type 'a' is simply a function which takes a string

  • as an input, and then it gives a list of results. And each result is a pair

  • comprising a single value of type 'a' -- maybe a tree, maybe a number, maybe

  • something else -- and then an unconsumed part of the input string. Okay, so this is

  • our final type. And if you look in any of the articles, or books, about these kind

  • of parser combinators, or functional parsers, you'll find a type very

  • similar to this there. It's quite a mouthful again, so let's think about how

  • we could understand this in a simpler way. And actually, we can write a little

  • rhyme to understand what's going on with this type. So let me write the rhyme out

  • for you as a comment. So what we can say is: a parser for things, is a function

  • from strings, to lists of pairs, of things and strings. This is a little Dr. Seuss

  • rhyme to tell you what a parser is, or what a functional parser is. And that's

  • actually how I remember this type. So that's our basic type now. We've seen so

  • far, a parser is basically a function from strings to trees, but in order to

  • actually program with these kind of things, we need to refine the type a

  • little bit. And this is the type we'll be working with today. But you don't need to

  • worry about the details of it. Just basically think it's a function from

  • strings into trees, or some other kind of structure. What we're going to do now,

  • is we're going to load up the parsing library. So I'll start up

  • the compiler. This is a library, which contains a whole bunch of parsing stuff,

  • which allows us to program with parsers of this form. And this is a parsing library I

  • wrote myself. I'll see if I can get Sean to upload it as part of the video.

  • A parsing library comes with any programming language you can think off.

  • And again, if you just search for parser combinators, functional parsing,

  • whichever language you like, you'll find the library, which gives you all sorts of

  • basic ways of building parsers. And that's what I'm going to show you now.

  • All of these libraries work in the same way. So you have some basic primitives, or

  • basic building blocks, for parsers. And then you have a way of combining parsers

  • to build bigger parsers. So it's like a kind of Lego kit, or a construction kit.

  • You have some basic bricks that you can do things with. And then you can put

  • those bricks, or components, or primitives, together in all sorts of different ways.

  • So I'm going to show you a few of the primitives, and a few of the combining

  • forms. And then we'll do an example. The first primitive I want to show you is

  • very simple. It's just a way of parsing a single digit. So what the digit parser

  • does, is it takes a string of characters, and it tries to consume a single numeric

  • digit off the start of that string. And you might think, well, what does 'parse' do

  • here? What parse does, is it takes a parser, which in this case is just digit,

  • and it takes an input string to that parser, and it just applies one to the

  • other. So of course, this parser here is going to succeed, because we do have a

  • digit at the start of the input string. So we get exactly the expected result.

  • We get a list with one pair. And the first thing in the pair is the actual digit. We

  • get the character '1'. And the second thing in the pair is the unconsumed part of

  • the input. And that's something we could then try and parse subsequently with

  • another parser. We can test, does this thing fail properly? So, if I give it an

  • input string that doesn't have a single digit at the start, then it's going to

  • fail. So if I give it the input string "abc", there's no digit at the beginning, so

  • we're just going to get the empty list of results. So I'll show you one more

  • quick primitive. If I parse a single character, say an 'a', from that string, then

  • that will do the right thing. If I parse single character 'a', and I didn't have an

  • 'a' at the beginning, then it will fail. So we've seen two basic parsing

  • primitives here -- we've seen a way of parsing a digit,

  • a way of parsing a specific character, and we've seen that these things can

  • succeed or fail. And in the library, or in any of these libraries, there'll be a

  • bunch of these basic building blocks, or basic bricks, or primitives, that you can

  • use to build up your parsers. Where things get more interesting is when you

  • think how do you combine these kind of things, how do you use these basic bricks

  • to build actually useful parsers? Let me show you an example of this. So there's a

  • parsing combining form called 'some'. And what it does is it takes a parser as its

  • input, and it tries to apply it one or more times, as many times as possible. So

  • if we're trying to parse 'some' digits, what we're trying to do is consume one

  • digit, then two, then three, and as many as we can until we don't find any more

  • digits. So if we apply the 'some digit' parser to the string "123" then

  • it will do the expected thing. It will consume all three of the digits, and then

  • we'll get the empty string left here. So that's 'some', it gives us a form of

  • repetition. And we also have a very simple way of making a choice as well.

  • So if I want to make a choice here, between a digit, and a letter. And let me

  • parse the string "abc123". So, what we've got here, is this funny symbol

  • here -- with the three symbols -- that's a choice operator. It says do that, or do

  • that. So if I try to parse a digit, or a letter, what it's going to try first is

  • it will take the first character in the input string, and say is it a digit? If so,

  • I'll parse it. And if it's not a digit, then I'll go over to the other side, and

  • say is it a letter? And I will try and parse that. You can see what's happening

  • with this particular example here -- if I look at the first character, it's not a

  • digit, it's a letter. So when I apply the digit parser it would fail, and then the

  • 'or' operator, or the choice operator here, will go over to the other side and say,

  • well, is it a letter instead? And of course it is, so we can parse the single

  • 'a' off the front here, and then we get everything else as unconsumed.

  • And of course, if we wanted to be a bit more clever we could combine some of

  • these things. So I could say, some digit or letter -- get my brackets right --

  • "abc123", and then that will parse everything. Because all I'm doing here is

  • I'm repeating, or iterating, the choice between either parsing a single digit, or

  • a single letter. And I've got a string of digits and letters here,

  • so I can parse the whole thing. So I've consumed them all. And then I get nothing

  • left at the end. So what we've seen so far, is some basic building blocks, and

  • we've seen a couple of combining forms -- we've seen a way of doing

  • repetition, which is 'some', and we've seen a way of making a choice, which is the

  • funny operator in the middle there. What I haven't shown you so far, is how to do

  • some form of sequencing. And this is the most common thing you typically want to

  • do with parsers. You want to say do this, and then do that, and maybe do that as

  • well. You want to sequence things together. So I'll actually show you a bit

  • of the parsing library here. So here's the parsing library. And I don't want to

  • go through all the details of this, but one thing I want to know, is it's quite

  • short. If I kind of scroll down here, I think it's about four and a half screen

  • fulls. And I've got quite a big font here, and this is actually already quite a

  • sophisticated parsing library. So it shows you the power of this method, that you

  • don't need hundreds of lines of code to write parsers -- four and a half screen

  • fulls is a library which is fully fledged and you can

  • basically implement any parser that you like using this. So I'll

  • show you a couple of examples of sequencing. The first example I want to

  • show you is a parser for natural numbers. So what's a natural number? It's just a

  • non-negative integer, like 0, 1, 2, 3, or 10, or something like that. So you think how

  • do you parse a natural number? Well I'm going to use the sequencing notation for

  • parsers, which is to do notation. And the do notation is very simple -- you write

  • the word 'do', and then you have a whole bunch of parsers one after the other, and

  • it just runs them each in sequence. So the first thing here, is we're going to

  • parse 'some digits'. Because that's the basics of what a natural number is -- it's

  • just some digits. And if that succeeds, I'm going to call all those digits 'xs'.

  • So 'xs' is just going to be a list of all the digits. And then what I'm going to do

  • here to be a bit more flexible, probably when I parse a number I don't want a

  • string of characters back, I actually want the number. So I'm going to pass the

  • string in to a little function called 'read', which just converts the string into

  • a number. And then I'm going to simply return it. So the basic idea here is

  • we're sequencing two things together -- we're reading some digits, or parsing

  • some digits, and then we're translating those into a normal number, and then

  • we're returning it. And we're sequencing those things together using the 'do'

  • notation here. So just one more little example, because we'll use this in a

  • minute. Here's how you could parse an integer. So an integer is either a

  • negative number, or a positive number. So there's a

  • choice there. So we're going to use the choice operator. So here's the 'or'

  • operator, which we've seen a few minutes ago. And the two parts here just say -- have

  • we got a negative number, or have we got a positive number? So the parser for a

  • negative number, we use the 'do' notation, because we need to do three things.

  • So the three things are here. So if we're trying to parse a negative number, the

  • first thing we do is we parse a minus sign. So we're using the 'char' primitive

  • that we saw previously -- that will parse a minus sign. Then we're going to parse a

  • number, and call it 'n'. And then we need to remember that we need to make it

  • negative, so we negate it and then we return it. So again here, we're just using

  • the simple idea of sequencing three parsers, one after the other. And then the

  • 'or' here says, or we can just have a simple natural number. Okay so this

  • illustrates the idea of sequencing. And if you've seen some of my previous

  • videos, you may recognize the 'do' notation here. And this is because parsers form an

  • example of what's known as a 'monad'. And in fact, for me, parsers being monadic is

  • one of the key ways to understand what a monad is. So if you've seen the monad

  • video, or even if you haven't seen it, maybe you have a look back at that,

  • and if you find it interesting. maybe look up some of the work which people

  • have done on monadic parsing. And it's a really good way to get a very good feel

  • for what's going on with both these kind of parsers, and monads as well. We've seen

  • that parsers are basically functions -- they take a string as an input, and they

  • produce essentially a tree as an output. We've seen some basic primitives for

  • consuming single digits, and single characters, and things like that. And

  • we've seen some basic combining forms -- we can have repetition, with 'some', we can

  • have choice, and we can have sequencing. So we've got our basic kind of building

  • blocks for making larger parsers. So let's wrap this up now by doing a little

  • example. And the example I want to do is to build a really simple parser for the

  • kind of expressions, or arithmetic expressions, that we saw back at the

  • start. So things like two plus three times four. So what I'm doing here is I'm

  • writing a Haskell program, which is going to implement this parser. What I've got

  • in the first line here is simply importing the parsing library, which

  • we've just seen -- it's just four and a half pages of code -- it's very

  • straightforward. And what we've got here in the comments, is a simple way of

  • writing down what the syntax, or form, or structure, of expressions are.

  • And this is what's known as a 'grammar'. But it doesn't matter if you

  • don't know what a grammar is, because the basic idea is very simple here. The first

  • line says an expression can be one of two things, so this means 'or' here, in

  • grammars. So an expression can either be a term plus an expression, or it can be a

  • term. And then in turn, a term could either be a factor times a term, or a

  • factor. And finally, a factor can be a bracketed expression, or an integer. So

  • there's three simple rules here, which explain the form, or structure, of what a

  • simple arithmetic expression can be. There's actually quite a lot of things going on

  • here, but the only key thing I want to point out is we've got three rules,

  • because there's three different levels of priority in an expression. So the

  • highest level of priority is brackets. So that's one thing you learn in school, you

  • learn that you do the brackets first, that's the highest priority thing.

  • The middle level of priority here is multiplication, so that's thing in the

  • middle rule. And the lowest level of priority is you do addition, and that's

  • sitting at the top rule. And again, this priority order is something you learn at

  • school -- you do brackets first, then you do multiplication second, and then you do

  • things like addition last. And these three rules are just making that precise.

  • And again, if you want to know more about grammars, you can search on that,

  • and you'll find a lot of information about that online.

  • So what we want to do now, is take this little grammar, and implement it as an

  • actual parser. And this is very straightforward to do, because we're

  • using this functional parsing idea. Essentially, we just take those three

  • grammatical rules, and we just implement them using the combining forms, and the

  • primitives that we've seen. So it's a very straightforward translation.

  • So the first one, is we want to say an expression can either be a term plus an

  • expression, or a term. So let's do the first part of that. So we've got term

  • plus expression. So what we're going to do is parse a term, and if that succeeds

  • call it 'x'. Then we're going to parse a '+' character, then we're going to parse

  • an expression and call it 'y', and then we're going to return x plus y. So there's

  • four things going on in sequence here. We're first of all parsing a term.

  • Then we're parsing the '+' character. Then we're parsing an expression, and

  • we're getting the values x and y -- these will be numbers. And then we're going to

  • simply add those two numbers together. So you can see here we're actually doing

  • more than just parsing -- we're actually evaluating the expression as well. And

  • that's one of the advantages of this approach to parsing, that it's not just

  • about building a tree, you can actually process things as you're going along. And

  • we're actually processing them here by doing complete evaluation. So x and y

  • will be numbers which, result from this term and this expression, and then we just

  • add them together here, and return the result. Then the last part of parsing an

  • expression is we can either be a term plus an expression, or we could be a term.

  • So we just use the choice operator, and we get term. And these five lines here

  • are our full parser for expressions. So we had this one rule up here -- an

  • expression could be a term plus an expression, or a term, and we just

  • translate it directly into our parsing notation. And again the key observation

  • here is that the grammatical rule here looks basically the same as the parser.

  • So let's look at the second rule, and in fact it's pretty much the same as the

  • first rule, except a few symbols are changed. So let me just copy it. And I can

  • just change it. So I can say a term can be a factor times a term, and then I can

  • do a multiply there, or I get a factor. Okay, so in just a few key presses, I've

  • managed to implement my parser for terms. And again the point to note here is that

  • the grammatical rule here looks basically exactly the same as the parser.

  • Okay, I've just got a few more symbols in here, because I'm actually writing a

  • program to do parsing, or actually evaluation, but it's the same basic

  • structure. And then just to wrap things up, I can implement what a factor is.

  • So a factor is either a bracketed expression, or it's an integer. So let me

  • write the parser for that. So a bracketed expression, I just parse a character, and

  • then I parse as an expression and call it 'x', and then I parse a closed bracket, and

  • then I return the 'x'. Or, I can parse an integer. And again, if you look at the

  • structure of the rule up here -- a factor is a bracketed expression or an integer.

  • I've got exactly the same thing down here -- here I'm parsing a bracketed

  • expression, and here I'm parsing an integer. And this is actually our entire

  • parser. We've got three lines up here, and we've just got kind of ten or fifteen

  • lines down here, and this is actually a complete parser, and evaluator,

  • for arithmetic expressions. And again, the beauty of this approach is

  • the parser looks basically the same as the grammar. So let's try and see if this

  • works, and hopefully I haven't made any mistakes. So let's load it into the

  • system. Okay that's great, no errors. And we can see that we've loaded two files

  • in now -- we've loaded the parsing file in, which is about four and a half pages

  • of definitions, and we've loaded the example program in now. So now we can check,

  • does our parser actually do what we want. So let's try out our parser with the

  • little example that we had at the start, 2 plus 3 times 4. So we're going

  • to parse an expression, and the expression is 2 plus 3 times 4.

  • And we press return, and we hope we get the right result, and we do. So remember

  • from school, you do the multiply first, you do the 3 times 4, so you get 12,

  • and then you add the 2 on at the end, and you get 14. So we've managed to

  • get the result 14 here, and there's no portion of the input string left. So

  • we've got a successful result -- we've got a list, and we've got one result value,

  • we've got 14, and we've managed to consume the whole thing. Or we could check,

  • does this actually work with more sophisticated examples? So let's try

  • putting some brackets in, and let's put brackets around the 2 plus 3, so we get 2

  • plus 3 times 4. So we hope then that we do the addition first, and we get 5, and

  • then times by 4 to get 20. Yes, and it works. We can try more sophisticated

  • examples. So let's do something like 2 plus 7 times 10 plus 8 times 20, and if

  • I've got the brackets right, yes then it works fine. We can also check what

  • happens if you give it something which doesn't parse. So suppose I do something

  • like, I parse expression, so 2 plus 3 times, and I forget to write the 4 at the

  • end. What's going to happen? Well, the parser will still manage to

  • succeed, because it will manage to parse the 2 plus 3, and we'll get the 5 out,

  • but it doesn't know what to do with this symbol sitting on its own. So you get

  • that back as an unconsumed part of the input. And again, we can try another

  • example. Suppose I forget to close the brackets, so I do something like 2 plus 3,

  • and I forget to close the brackets, then it won't know what to do with that at

  • all, and we'll just get the empty string. That's basically it -- this is the idea of

  • functional parsing, or combinator parsing. The idea is very simple --

  • parsers are basically functions, you define a library with some

  • basic building blocks, or primitives, some combining forms that let you put these

  • things together, and then you can end up writing parsers as we've seen that look

  • very similar to the grammars that you write to describe languages.

Today we're going to do some more live coding, and we're going to talk

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

功能解析 - Computerphile (Functional Parsing - Computerphile)

  • 1 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字