字幕列表 影片播放
-
Last time you [Sean] did a really good animation about top-down parsing. We had
-
these sentences in this totally artificial grammar, which has only got
-
about 30 words in it, all in all. But we started here with is
-
and, in the top-down approach, you basically say: "OK, I'll push those on
-
the stack in reverse order: " And, starting at the top, of the
-
stack, at the left, you say: "Well, what can a be? So, you do these things on
-
your stack, top down, one at a time, and you start off at the root of the tree
-
and you develop the component parts, left to right, one at a time. It is a lot
-
easier, totally by hand, to write a top-down parser rather than a bottom-up one.
-
But what I shall be getting to, later on and in fact I'm starting right now, is to
-
say: "Well what happens if instead of starting up at the root and developing
-
the leaves of the tree, as it were, you start right down at the bottom, with the
-
text string that you know is correct, and try and work upwards you know can you
-
work back upwards, from the leaves to the root?"
-
So, in fact let's write that test sentence below here: the robot stroked two furry ..."
-
And actually, since layout doesn't matter too much at the moment I'll squeeze the
-
word 'dice' in at the right there. Now, be clear, we're talking about
-
bottom-up parsing now. In bottom-up parsing you start with the string that
-
you think is correct and you say: "Starting with a string can I look into the
-
rules and see how to work up the tree, not down the tree?" And, yes, so therefore
-
you're looking at possible matches on the right-hand side [of the rules] for components of
-
this string, reading from left to right. OK 'the'. How many ways are there you
-
can match the string 'the' against one of these classifications here? Well the
-
text - the string 'the' - is the right-hand side possibility of an icle. That's
-
one way to do it. Oh! and then look up here, right at the top of the grammar, if
-
you have an icle followed by a , like 'the' followed by something else,
-
that could be a ect, and that's looking good because right up at the top of the
-
grammar we want to end up with ect . Looking just at "the robot",
-
and looking at the grammar right-hand sides, I could do it by saying: "Well it's
-
the subject of the sentence, it's at the left-hand side, and if I go icle
-
I get 'the' and I get 'robot'. But what I've done, just to act as a talking point
-
and it illustrates a lot of things here, I've given you a shortcut. If you want to,
-
you can just do "the robot", with no further interior analysis at all. It's an
-
allowed phrase; it's the ect. Now I have to say that, as we develop this
-
story, we will get into bottom-up parsing. Because one of the tools were going to
-
use called 'yacc' basically produces bottom-up parsers for you, not top-down ones.
-
And it's a yacc behaviour symptom that it
-
loves, when you're trying to match text strings,
-
it likes to match the longest one that it can [legally] manage. So it is going to seize on
-
"the robot", all as one phrase, as being a wonderful long solution. Why doesn't it
-
use 'the' and then wait patiently for ? That's not the bottom up way.
-
If you can see a longer ... Oh! and this thing by the way, that you're looking at, is
-
built upon a stack, of course, it's called the 'handle'. I get a longer handle by
-
going for this option here and getting "the robot", all in one go. OK "the robot"
-
then, and you've got to get used to reading from right to left now,
-
in bottom-up parsing, "the robot" all as one phrase is an example of a ect. So, we
-
can now say: "OK, 'the robot' is my ect". Now that act - of picking up a substring
-
from your sentence and going upwards, and making it more abstract if you like - it's
-
called 'reducing', in bottom-up parsing. So, looking for a longer and longer and
-
longer string, to get your handle, that's called 'shifting', because you're shifting
-
characters, one after another, and making the string longer and longer and saying
-
"... can I go any further?" That's shifting but when you say: "Ooh! that's a nice long
-
string - and it matches" and then you go up and say: "Oh! that's my subject", that is
-
called 'reduction' because you're going to something simpler further up the tree.
-
So, you can tick that off as being done bottom-up. [The] next thing is, you see this
-
string of characters called 'stroked' and, once again, it's right-hand-side driven.
-
What is there, on the right hand side, and which rule is it, that could possibly
-
match 'stroked'? You see in here, against , 'bit', 'kicked' or 'stroked'. Those three
-
strings are your possibilities. So, that's fine. Going right to left you say 'stroked'
-
is an example of a . So we've got our there. Now,again,
-
I've cheated but it's wonderful fun! I have not analyzed "two furry dice" into
-
adjectives and nouns or anything like that. I've just put it in as a interesting
-
short-cut to have there. And it is an example of what I would call an object-phrase
-
Some of you, who are really good English linguists, may want to go on about
-
my lack of understanding about what a direct and indirect object are - not to
-
mention 'predicates' and so on. But please forgive me. I regard it as being a phrase
-
in an object position. So, I'm saying there's a quick match here and bottom-up
-
I love this: "two furry dice" is a great long handle. Oh! and if I match it
-
there, what's the left-hand side it corresponds
-
to? [Answer is] . OK then, we've won! We have worked bottom-up to having ect ect
-
on our stack starting with the string. And, what's more, we've exhausted the
-
[input] string now. It's the end of it. There's a sort of full stop after that.
-
There we are then. We've got top-down which tends to be
-
more - how shall we say? Eager?- you know a top-down parse would very probably leap on the
-
word 'the', and not bother to go any further because it's found a quick match for it,
-
whereas bottom-up is the other way round. It's basically saying: "I want the longest
-
possible handle". Even at this stage in the late 50s and early 60s there
-
was a sneaking suspicion, coming around, that actually bottom-up parsing was a
-
little bit more powerful than top-down. I'm going to put out a set of notes for
-
this so that you can look up for yourself. Just examples of why it [i.e. bottom up] is more
-
powerful. But roughly speaking I think you can sense that because you've not
-
only got something you're looking for but you've [also] got a handle that you've
-
already accumulated, it's like gathering more contextual information - going
-
bottom-up. But, on the other hand, handling the stack and working out what's
-
happening is a darn sight more complicated - if you do it by hand - coming bottom up.
-
Rather than doing it all by hand, why not me and you lot [together]. It's a good way
-
to learn 'lex' and 'yacc' In other words don't write the C directly yourself. Get
-
a software tool, like these two, to do it for you. So, that's exactly what
-
we're going to do. I've got the program 'putty' that does 'ssh' connected here.
-
I'm talking to my other Linux machine in the other room, where I have got set up a
-
parser - complete parser: front-end lexical analyzer [then] syntax analysis - for this
-
'furry grammar' and all legal sentences in it. And I know, first of all, you will want me
-
to call up the program that implements this and the test sentence first of all is:
-
[In unison] "the robot stroked two furry dice". Here we go!
-
So, "furry". It's hanging there, it's waiting for us to give a correct furry sentence [reads from screen] "... dice ... return"
-
Look at that! It's happy with it! I'm giving it in subject-verb-object
-
order and I have numbered those rules, in the grammar as I showed you earlier, and
-
I now have, as it were, a map of how it has parsed it. Rule 3?
-
Now that is the one that effectively says I can do "the robot" all as one
-
phrase. It has chosen not to go for 'the' and 'robot' as icle and ,
-
as separate entities. It might well have done that, had I gone top-down, but because this yacc-confected
-
parser system goes bottom-up it's gone for the longest possible
-
handle at that stage and it's matched it. Rule 4: the middle piece is matched
-
'stroked' as the verb and finally it has spotted, right at the very end, that I put
-
in another sneaky short-cut to "two furry dice" is Rule 6. And that is my parse.
-
So, should we try one more just to make sure? Go on Sean, tell me which one to try next.
-
>> Sean: Try "the woman bit the dog" >> DFB: Yep, "the woman bit the dog"
-
There you are look - Rule 2 for "the woman" now. Rule 2 not rule 3. if it has followed Rule 2 it's
-
gone down the icle route, which means it knows that's the only way to
-
match "the woman". There is no shortcut way. OK?
-
Rule 4 - a rule again; it chose 'bit' Rule 5: now this time, again, there is no
-
short-cut to "two furry dice" at Rule 6 You've got to go the long way around and
-
following Rule 5, you break it down into icle again: 'the' and 'dog'
-
So, there we are. You could say well you've written a compiler for the
-
'furry' language, with the help of lex and yacc. We could go into details of that
-
later, if need be, but not now. It's fair enough but it's not doing anything really is it?
-
What more shall we do with this, now we've written it this far then Sean?
-
You tell me ?! >> Sean: Well I think next time we need to come up with an action, it needs to do something ....
-
>> DFB: We need to transform that grammar in some way.
-
Those of you who, in the previous video, actually bothered to look at the EXTRA BITS
-
may have had a sneak preview as to what we're going to do, as our much more
-
interesting actions now we have recognized the innate structure.
-
So, remember - always watch the EXTRA BITS !