字幕列表 影片播放 列印英文字幕 Last time you [Sean] did a really good animation about top-down parsing. We had these sentences in this totally artificial grammar, which has only got about 30 words in it, all in all. But we started here with is and, in the top-down approach, you basically say: "OK, I'll push those on the stack in reverse order: " And, starting at the top, of the stack, at the left, you say: "Well, what can a be? So, you do these things on your stack, top down, one at a time, and you start off at the root of the tree and you develop the component parts, left to right, one at a time. It is a lot easier, totally by hand, to write a top-down parser rather than a bottom-up one. But what I shall be getting to, later on and in fact I'm starting right now, is to say: "Well what happens if instead of starting up at the root and developing the leaves of the tree, as it were, you start right down at the bottom, with the text string that you know is correct, and try and work upwards you know can you work back upwards, from the leaves to the root?" So, in fact let's write that test sentence below here: the robot stroked two furry ..." And actually, since layout doesn't matter too much at the moment I'll squeeze the word 'dice' in at the right there. Now, be clear, we're talking about bottom-up parsing now. In bottom-up parsing you start with the string that you think is correct and you say: "Starting with a string can I look into the rules and see how to work up the tree, not down the tree?" And, yes, so therefore you're looking at possible matches on the right-hand side [of the rules] for components of this string, reading from left to right. OK 'the'. How many ways are there you can match the string 'the' against one of these classifications here? Well the text - the string 'the' - is the right-hand side possibility of an icle. That's one way to do it. Oh! and then look up here, right at the top of the grammar, if you have an icle followed by a , like 'the' followed by something else, that could be a ect, and that's looking good because right up at the top of the grammar we want to end up with ect . Looking just at "the robot", and looking at the grammar right-hand sides, I could do it by saying: "Well it's the subject of the sentence, it's at the left-hand side, and if I go icle I get 'the' and I get 'robot'. But what I've done, just to act as a talking point and it illustrates a lot of things here, I've given you a shortcut. If you want to, you can just do "the robot", with no further interior analysis at all. It's an allowed phrase; it's the ect. Now I have to say that, as we develop this story, we will get into bottom-up parsing. Because one of the tools were going to use called 'yacc' basically produces bottom-up parsers for you, not top-down ones. And it's a yacc behaviour symptom that it loves, when you're trying to match text strings, it likes to match the longest one that it can [legally] manage. So it is going to seize on "the robot", all as one phrase, as being a wonderful long solution. Why doesn't it use 'the' and then wait patiently for ? That's not the bottom up way. If you can see a longer ... Oh! and this thing by the way, that you're looking at, is built upon a stack, of course, it's called the 'handle'. I get a longer handle by going for this option here and getting "the robot", all in one go. OK "the robot" then, and you've got to get used to reading from right to left now, in bottom-up parsing, "the robot" all as one phrase is an example of a ect. So, we can now say: "OK, 'the robot' is my ect". Now that act - of picking up a substring from your sentence and going upwards, and making it more abstract if you like - it's called 'reducing', in bottom-up parsing. So, looking for a longer and longer and longer string, to get your handle, that's called 'shifting', because you're shifting characters, one after another, and making the string longer and longer and saying "... can I go any further?" That's shifting but when you say: "Ooh! that's a nice long string - and it matches" and then you go up and say: "Oh! that's my subject", that is called 'reduction' because you're going to something simpler further up the tree. So, you can tick that off as being done bottom-up. [The] next thing is, you see this string of characters called 'stroked' and, once again, it's right-hand-side driven. What is there, on the right hand side, and which rule is it, that could possibly match 'stroked'? You see in here, against , 'bit', 'kicked' or 'stroked'. Those three strings are your possibilities. So, that's fine. Going right to left you say 'stroked' is an example of a . So we've got our there. Now,again, I've cheated but it's wonderful fun! I have not analyzed "two furry dice" into adjectives and nouns or anything like that. I've just put it in as a interesting short-cut to have there. And it is an example of what I would call an object-phrase Some of you, who are really good English linguists, may want to go on about my lack of understanding about what a direct and indirect object are - not to mention 'predicates' and so on. But please forgive me. I regard it as being a phrase in an object position. So, I'm saying there's a quick match here and bottom-up I love this: "two furry dice" is a great long handle. Oh! and if I match it there, what's the left-hand side it corresponds to? [Answer is] . OK then, we've won! We have worked bottom-up to having ect ect on our stack starting with the string. And, what's more, we've exhausted the [input] string now. It's the end of it. There's a sort of full stop after that. There we are then. We've got top-down which tends to be more - how shall we say? Eager?- you know a top-down parse would very probably leap on the word 'the', and not bother to go any further because it's found a quick match for it, whereas bottom-up is the other way round. It's basically saying: "I want the longest possible handle". Even at this stage in the late 50s and early 60s there was a sneaking suspicion, coming around, that actually bottom-up parsing was a little bit more powerful than top-down. I'm going to put out a set of notes for this so that you can look up for yourself. Just examples of why it [i.e. bottom up] is more powerful. But roughly speaking I think you can sense that because you've not only got something you're looking for but you've [also] got a handle that you've already accumulated, it's like gathering more contextual information - going bottom-up. But, on the other hand, handling the stack and working out what's happening is a darn sight more complicated - if you do it by hand - coming bottom up. Rather than doing it all by hand, why not me and you lot [together]. It's a good way to learn 'lex' and 'yacc' In other words don't write the C directly yourself. Get a software tool, like these two, to do it for you. So, that's exactly what we're going to do. I've got the program 'putty' that does 'ssh' connected here. I'm talking to my other Linux machine in the other room, where I have got set up a parser - complete parser: front-end lexical analyzer [then] syntax analysis - for this 'furry grammar' and all legal sentences in it. And I know, first of all, you will want me to call up the program that implements this and the test sentence first of all is: [In unison] "the robot stroked two furry dice". Here we go! So, "furry". It's hanging there, it's waiting for us to give a correct furry sentence [reads from screen] "... dice ... return" Look at that! It's happy with it! I'm giving it in subject-verb-object order and I have numbered those rules, in the grammar as I showed you earlier, and I now have, as it were, a map of how it has parsed it. Rule 3? Now that is the one that effectively says I can do "the robot" all as one phrase. It has chosen not to go for 'the' and 'robot' as icle and , as separate entities. It might well have done that, had I gone top-down, but because this yacc-confected parser system goes bottom-up it's gone for the longest possible handle at that stage and it's matched it. Rule 4: the middle piece is matched 'stroked' as the verb and finally it has spotted, right at the very end, that I put in another sneaky short-cut to "two furry dice" is Rule 6. And that is my parse. So, should we try one more just to make sure? Go on Sean, tell me which one to try next. >> Sean: Try "the woman bit the dog" >> DFB: Yep, "the woman bit the dog" There you are look - Rule 2 for "the woman" now. Rule 2 not rule 3. if it has followed Rule 2 it's gone down the icle route, which means it knows that's the only way to match "the woman". There is no shortcut way. OK? Rule 4 - a rule again; it chose 'bit' Rule 5: now this time, again, there is no short-cut to "two furry dice" at Rule 6 You've got to go the long way around and following Rule 5, you break it down into icle again: 'the' and 'dog' So, there we are. You could say well you've written a compiler for the 'furry' language, with the help of lex and yacc. We could go into details of that later, if need be, but not now. It's fair enough but it's not doing anything really is it? What more shall we do with this, now we've written it this far then Sean? You tell me ?! >> Sean: Well I think next time we need to come up with an action, it needs to do something .... >> DFB: We need to transform that grammar in some way. Those of you who, in the previous video, actually bothered to look at the EXTRA BITS may have had a sneak preview as to what we're going to do, as our much more interesting actions now we have recognized the innate structure. So, remember - always watch the EXTRA BITS !
B1 中級 自下而上的解析 - Computerphile (Parsing Bottom Up - Computerphile) 4 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字