## 字幕列表 影片播放

• In our previous lesson, we talked about binary trees in general. Now, in this lesson we are

• going to talk about binary search tree, a special kind of binary tree which is an efficient

• structure to organize data for quick search as well as quick update. But before I start

• talking about binary search tree, I want you to think of a problem. What data structure

• will you use to store a modifiable collection? So, lets say you have a collection and it

• can be a collection of any data type, records in the collection can be of any type. Now,

• you want to store this collection in computers memory in some structure and then you want

• to be able to quickly search for a record in the collection and you also want to be

• able to modify the collection. You want to be able to insert an element in the collection

• or remove an element from the collection. So, what data structure will you use? Well,

• you can use an array or a linked list. These are two well known data structure in which

• we can store a collection. Now, what will be running time of these operations - search,

• insertion or removal, If we will use an array or a linked list. Lets first talk about arrays

• and for sale of simplicity, lets say we want to store integers. To store a modifiable list

• or collection of integers, we can create a large enough array and we can store the records

• in some part of the array. We can keep the end of the list marked. In this array that

• I am showing here, we have integers from 0 till 3. We have records from 0 till 3 and

• rest of the array is available space. Now to search some X in the collection, we will

• have to scan the array from index 0 till end and in worst case, we may have to look at

• all the elements in the list. If n is the number of elements in the list, time taken

• will be proportional to n or in other words, we can say that time complexity will be big-oh

• of n. Ok, now what will be the cost of insertion. Lets say we want to insert number 5 in this

• list. So, if there is some available space, all the cells in yellow are available, we

• can add one more cell by incrementing this marker 'end' and fill in the integer to be

• added. Time taken for this operation will be constant. Running time will not depend

• upon number of elements in the collection. So, we can say that time complexity will be

• big-oh of 1. Ok, now what about removal. Lets say we want to remove 1 from the collection.

• What we'll have to do is, we'll have to shift all records to the right of one by one position

• to the left and then we can decrement end. The cost of removal once again will be big-oh

• of n. In worst case, we will have to shift n-1 elements. Here, the cost of insertion

• will be big-oh of 1, if the array will have some available space. So, the array has to

• be large enough. If the array gets filled, what we can do is, we can create a new larger

• array, typically we create an array twice the size of the filled up array. So, we can

• create a new larger array and then we can copy the content of the filled up array into

• this new larger array. The copy operation will cost us big-oh of n. We have discussed

• this idea of dynamic array quite a bit in our previous lessons. So, insertion will be

• big-oh of 1 if array is not filled up and it will be big-oh of n if array is filled

• up. For now, lets just assume that the array will always be large enough. Lets now discuss

• the cost of these operations if we will use a linked list. If we would use a linked list,

• I have drawn a linked list of integers here, data type can be anything, the cost of search

• operation once again will be big-oh of n where n is number of records in the collection or

• number of nodes in the linked list. To search, in worst case, we will have to traverse the

• whole list. We will have to look at all the nodes. The cost of insertion in a linked list

• is big-oh of 1 at head and its big-oh of n at tail. We can choose to insert at head to

• keep the cost low. So, running time of insertion, we can say is big-oh of 1 or in other words,

• we will take constant time. Removal once again will be big-oh of n. We will first have to

• traverse the linked list and search the record and in worst case, we may have to look at

• all the nodes. Ok, so this is the cost of operations if we are going to use array or

• linked list. Insertion definitely is fast. But, how good is big-oh of n for an operation

• like search. What do you think? if we are searching for a record X, then in the worst

• case, we will have to compare this record X with all the n records in the collection.

• Lets say, our machine can perform a million comparisons in one second. So, we can say

• that machine can perform 10 the power 6 comparisons in one second. So, cost of one comparison

• will be 10 to the power -6 second. Machines in today's world deal with really large data.

• Its very much possible for real world data to have 100 million records or billion records.

• A lot of countries in this world have population more than 100 million. 2 countries have more

• than a billion people living in them. If we will have data about all the people living

• in a country, then it can easily be 100 million records. Ok, so if we are saying that the

• cost of 1 comparison is 10 the power -6 second. If n will be 100 million, time taken will

• be 100 seconds. 100 seconds for a search is not reasonable and search may be a frequently

• performed operation. Can we do something better? Can we do better than big-oh of n. Well, in

• an array, we can perform binary search if its sorted and the running time of binary

• search is big-oh of log n which is the best running time to have. I have drawn this array

• of integers here. Records in the array are sorted. Here the data type is integer. For

• some other data type, for some complex data type, we should be able to sort the collection

• based on some property or some key of the records. We should be able to compare the

• keys of records and the comparison logic will be different for different data types. For

• a collection of strings, for example, we may want to have the records sorted in dictionary

• or lexicographical order. So, we will compare and see which string will come first in dictionary

• order. Now this is the requirement that we have for binary search. The data structure

• should be an array and the records must be sorted. Ok, so the cost of search operation

• can be minimized if we will use a sorted array. But, in insertion or removal, we will have

• to make sure that the array is sorted afterwards. In this array, if I want to insert number

• 5 at this stage, i can't simply put 5 at index 6. what I'll have to do is, I'll first have

• to find the position at which I can insert 5 in the sorted list. We can find the position

• in order of log n time using binary search. We can perform a binary search to find the

• first integer greater than 5 in the list. So, we can find the position quickly. In this

• case, its index 2. But then, we will have to shift all the records starting this position

• one position to the right. And now, I can insert 5. So, even though we can find the

• position at which a record should be inserted quickly in big-oh of log n, this shifting

• in worst case will cost us big-oh of n. So, the running time overall for an insertion

• will be big-oh of n and similarly, the cost of removal will also be big-oh of n. We will

• have to shift some records. Ok, so when we are using sorted array, cost of search operation

• is minimized. In binary search for n records, we will have at max log n to the base 2 comparisons.

• So, if we can perform million comparisons in a second, then for n equal 2 to the power

• 31 which is greater than 2 billion, we are going to take only 31 micro-seconds. log of

• 2 to the power 31 to base 2 will be 31. Ok, we are fine with search now. we will be good

• for any practical value of n. But what about insertion and removal. They are still big-oh

• of n. Can we do something better here? Well, if we will use this data structure called

• binary search tree, I am writing it in short - BST for binary search tree, then the cost

• of all these 3 operations can be big-oh of log n in average case. The cost of all the

• operations will be big-oh of n in worst case. But we can avoid the worst case by making

• sure that the tree is always balanced. We had talked about balanced binary tree in our

• previous lesson. Binary search tree is only a special kind of binary tree. To make sure

• that the cost of these operations is always big-oh of log n, we should keep the binary

• search tree balanced. We'll talk about this in detail later. Let's first see what a binary

• search tree is and how cost of these operations is minimized when we use a binary search tree.

• Binary search tree is a binary tree in which for each node, value of all the nodes in left

• sub-tree is lesser and value of all the nodes in right sub-tree is greater. I have drawn

• binary tree as a recursive structure here. As we know, in a binary tree, each node can

• have at most 2 children. We can call one of the children left child. If we will look at

• the tree as recursive structure, left child will be the root of left sub-tree and similarly,

• right child will be the root of right sub-tree. Now, for a binary tree to be called binary

• search tree, value of all the nodes in left sub-tree must be lesser or we can say lesser

• or equal to handle duplicates and the value of all the nodes in right sub-tree must be

• greater and this must be true for all the nodes. So, in this recursive structure here,

• both left and right sub-trees must also be binary search trees. I'll draw a binary search

• tree of integers. Now, I have drawn a binary search tree of integers here. Lets see whether

• this property that for each node value of all the nodes in left subtree is lesser or

• equal and the value of all the nodes in right sub-tree must be greater is true or not. Lets

• first look at the root node. Nodes in the left subtree have values 10, 8 and 12. So,

• they are all lesser than15 and in right subtree, we have 17, 20 and 25, they are all greater

• than 15. So, we are good for the root node. Now, lets look at this node with value 10.

• In left, we have 8 which is lesser. In right, we have 12 which is greater. So, we are good.

• We are good for this node too having value 20 and we don't need to bother about leave

• nodes because they do not have children. So, this is a binary search tree. Now, what if

• I change this value 12 to 16. Now, is this still a binary search tree. Well, for node

• with value 10, we are good. The node with value 16 is in its right. So, not a problem.

• But for the root node, we have a node in left sub-tree with higher value now. So, this tree

• is not a binary search tree. I'll revert back and make the value 12 again. Now, as we were

• saying we can search, insert or delete in a binary search tree in big-oh of log n time

• in average case. How is it really possible? Lets first talk about search. If these integers

• that I have here in the tree were in a sorted array, we could have performed binary search

• and what do we do in binary search. Lets say we want to search number 10 in this array.

• What we do in binary search is, we first define the complete list as our search space. The

• number can exist only within the search space. I'll mark search space using these two pointers

• - start and end. Now, we compare the number to be searched or the element to be searched

• with mid element of the search space or the median. And if the record being searched,

• if the element being searched is lesser, we go searching in the left half, else we go

• searching in the right half. In case of equality, we have found the element. In this case, 10

• is lesser than 15. So, we will go searching towards left. Our search space is reduced

• now to half. Once again, we will compare to the mid-element and bingo, this time, we have

• got a match. In binary search, we start with n elements in search space and then if mid

• element is not the element that we are looking for, we reduce the search space to n/2 and

• we go on reducing the search space to half, till we either find the record that we are

• looking for or we get to only one element in search space and be done. In this whole

• reduction, if we will go from n to n/2 to n/4 to n/8 and so on, we will have log n to

• the base 2 steps. If we are taking k steps, then n upon 2 to the power k will be equal

• to 1 which will imply 2 to the power k will be equal to n and k will be equal to log n

• to the base 2. So, this is why running time of binary search is big-oh of log n. Now,

• if we will use this binary search tree to store the integers, search operation will

• be very similar. Lets say, we want to search for number 12. What we'll do is, we start

• at root and then we will compare the value to be searched, the integer to be searched

• with value of the root. if its equal, we are done with the search, if its lesser, we know

• that we need to go to the left sub-tree because in a binary search tree, all the elements

• in left sub-tree are lesser and all the elements in right sub-tree are greater. Now, we will

• go and look at the left child of node with value 15. We know that number 12 that we are

• looking for can exist in this sub-tree only and anything apart from the sub-tree is discarded.

• So, we have reduced the search space to only these 3 nodes having value 10, 8 and 12. Now,

• once again, we will compare 12 with 10. We are not equal. 12 is greater, so we know that

• we need to go looking in right sub-tree of this node with value 10. So, now our search

• space is reduced to just one node. Once again, we will compare the value here at this node

• and we have a match. So, searching an element in binary search tree is basically this traversal

• in which at each step, we will go either towards left or right and hence at each step, we will

• discard one of the sub-trees. if the tree is balanced, we call a tree balanced if for

• all nodes, the difference between the heights of left and right sub-trees is not greater

• than 1. So, if the tree is balanced, we will start with a search space of n nodes and when

• we will discard one of the sub-trees, we will discard n/2 nodes. So, our search space will

• be reduced to n/2 and then in next step, we will reduce the search space to n/4. We will

• go on reducing like this till we find the element or till our search space is reduced

• to only one node when we will be done. So, the search here is also a binary search. And

• that's why the name - Binary Search Tree. This tree that I am showing here is balanced.

• In fact this is a perfect binary tree. But with same records, we can have an unbalanced

• tree like this. This tree has got the same integer values as we had in the previous structure

• and this is also a binary search tree, but this is unbalanced. This is as good as a linked

• list. In this tree there is no right sub-tree for any of the nodes. Search space will be

• reduced by only one.at each step. From n nodes in search space, we will go to n-1 nodes and

• then to n-2 nodes, all the way till 1 will be n steps. In binary search tree, in average

• case, cost of search, insertion or deletion is big-oh of log n and in worst case, this

• is the worst case arrangement that I am showing you. The running time will be big-oh of n.

• We always try to avoid the worst case by trying to keep the binary search tree balanced. With

• same records in the tree, there can be multiple possible arrangements. For these integers

• in this tree, another arrangement is this. For all the nodes, we have nothing to discard

• in left sub-tree in a search. This is another arrangement. This is still balanced because

• for all the nodes, the difference between the heights of left and right sub-tree is

• not greater than 1. But this is the best arrangement when we have a perfect binary tree. At each

• step, we will have exactly n/2 nodes to discard. Ok, now to insert some records in binary search

• tree, we will first have to find the position at which we can insert and we can find the

• position in big-oh of log n time. Lets say we want to insert 19 in this tree, what we

• will do is start at the root. If the value to be inserted is lesser or equal, if there

• is no child, insert as left child or go left. If the value is greater and there is no right

• child, insert as right child or go right. In this case, 19 is greater, so we will go

• right. Now, we are at 20. 19 is lesser and left sub-tree is not empty. We have a left

• child. So, we will go left. Now, we are at 17, 19 is greater than 17. So, it should go

• in right of 17. There is no right child of 17. So, we will create a node with value 19

• and link it to this node with value 17 as right child. Because we are using pointers

• or references here just like linked list, no shifting is needed like an array. Creating

• a link will take constant time. So, overall insertion will also cost us like search. To

• delete also, we will first have to search the node. Search once again will be big-oh

• of log n and deleting the node will only mean adjusting some links. So, removal also is

• going to be like search - big-oh of log n in average case. Binary search tree gets unbalanced

• during insertion and deletion. So, often during insertion and deletion, we restore the balancing.

• There are ways to do it and we will talk about all of this in detail in later lessons. In

• next lesson, we will discuss implementation of binary search tree in detail. This is it

• for this lesson. Thanks for watching.

In our previous lesson, we talked about binary trees in general. Now, in this lesson we are

B1 中級

# 數據結構。二進制搜索樹 (Data structures: Binary Search Tree)

• 41 6
Hhart Budha 發佈於 2021 年 01 月 14 日