Placeholder Image

字幕列表 影片播放

  • Okay, so artificial intelligence machine learning data mining data analysis

  • clustering classification data pre-processing

  • big data

  • It's hard to go anywhere now without hearing about AI and machine learning and data data, particularly

  • It's everywhere research

  • We've suggested that every two years we generate more data than ever existed before

  • So the amount of data is doubling every two years now, that isn't absolutely am, you know astronomical amount of data

  • but the thing is of course that

  • This data doesn't necessarily mean anything the fact you can create tables of data

  • But unless you understand what's in them and what they mean, you haven't got any knowledge, right?

  • So there's a distinction between having data and having knowledge. So all very well saying yes as a species

  • We're producing a huge amount of data

  • But actually a lot of it doesn't get used a lot of it sits there on a hard disk

  • Waiting for someone to look at it and that's kind of what we're talking about here if we want to extract knowledge from data

  • we're going to need some tools and processes to do this in a formal way and that's that's what data science is, right and

  • Things like machine learning and AI have a place within it

  • So perhaps if you do this for your job, then data analysis is going to be useful for you

  • Maybe your company's generating data and you want to analyze this data?

  • But on the other hand, perhaps you're just a consumer and companies are using data on you. They're generating data on you

  • And actually they're profiting from data on you. These are sometimes life-changing decisions that are being made on your data

  • And so it's empowering to know how this process works and I'm a very simple example

  • Which you might even do yourself suppose you go online to book some flights for a holiday

  • And then you decide that actually two flights via an intermediate Airport is cheaper than a single flight, right?

  • You're doing data analysis

  • Say you're taking lots of different data sources and working out the optimal route and this of course happens automatically as well

  • Depending on the flight website that you're using. All right, so this kind of stuff you're already doing it

  • It's just a case of trying to formalize this process. So what do any of the things I listed at the beginning mean?

  • Well one problem is that everyone's definitions differ slightly

  • But also I think that a lot of these terms are used completely interchangeably AI is the classic example

  • So AI is everywhere right talk

  • You can't buy a product without it having been having AI added to it a lot of the time you see AI

  • We're actually talking about machine learning

  • so machine learning is the idea that we're

  • Training a machine to perform a task without explicitly

  • programming it to do so. A good example of AI that isn't machine learning would be lets say a mouse in a maze where all

  • You're doing is telling it to turn left or right at random not learning anything

  • It doesn't understand what the maze is but it will eventually get to the end right that's a kind of rudimentary artificial intelligence

  • That doesn't involve learning anything

  • Machine learning is about not giving it

  • Conditions not saying if you're here turn left if you're here turn, right

  • It's just giving it examples and hoping it will learn to perform most tasks itself, right?

  • So machine learning is a subset of AI but they shouldn't be used interchangeably if we're using machine learning often

  • What we'll do is we train it based on samples of data

  • So we'll have some existing data set that we're trying to train on and we're trying to use the machine learning to either tease out

  • information or make predictions on this data

  • The problem is that not all data is sort of made equal some of its noisy and messy

  • Maybe we don't know what it is and don't know whether we can apply a certain technique to it

  • Right. And so we need to clean this data up. We need to take this data understand what it is and extract some knowledge

  • So that we can then apply these AI or machine learning techniques to it

  • So this combination of things that can take data and prepare it in a way that we can then use it or understand it

  • That's data science

  • There are quite a few ways we could do this data analysis right throughout this course

  • We could use R, we could use Python, we could use MATLAB. They all have their pros and cons

  • We're gonna use R because it's free and it's really good for statistical analysis

  • It's got loads of great libraries

  • If you're really familiar with Python, then maybe that's what you want to start with for this kind of stuff

  • But we know we're going to be working with R

  • We have our script area here where we can write scripts and run scripts. You can save them and then come back to them later

  • Console where we're going to be putting in, you know specific commands

  • we have our environment which is where all our

  • Variables and our data is held and we can look at them there and then we have plots any plots of which you can do

  • quite a lot of different plots in R, very versatile. That's going to appear down here

  • Okay, so you've probably got everything you need to get started with data analysis. In my opinion

  • The best way to get into R is just to kind of have a go

  • So it's going to look at a few of the most obvious things that it does it has

  • A little bit of a learning curve only because it's syntax is slightly unusual

  • If you can program you'll be fine

  • but even if not

  • you should get there pretty quickly. Most of the time in R we'll be using either matrices or vectors or

  • Which are kind of a special case of matrices or maybe data frames data frames a really nice aspect of R which you can kind

  • Of think of like a table that you might have in in Excel, except you've also got headings for your columns

  • so let's have a look at some of these things and just a few of the things we can do with them before we perhaps

  • Go into a little bit more detail in other videos

  • so for example

  • We might look at our variable X which I've created and X is a sequence going from 0 all the way up to a few

  • multiples of Pi which I used to create this plot

  • That was only one line of code that produced that and I've used that to create my plot by essentially saying y equals sine X

  • And then just simply plotting that if you wanna get a little bit more complicated we can start looking at matrix data

  • So I created a CSV file with a Gaussian function in it. So essentially a two dimensional array of

  • Values that get bigger in the center very straightforward

  • the CSV file is essentially a text file with

  • commas separating those values very easy to read and write these out of Excel and other

  • packages and so they're off you'll often find data is passed around in this way at least

  • Moderately sized data, if it isn't too, you know to it too huge. I can load this in using my read CSV function

  • So I can say name data

  • Now the arrow operator is essentially equivalent in R for the assignment operators or equals equals will often work

  • But I tend to try and use this one. So namedata

  • I'm going to assign read dot CSV and the file is going to be norm dot CSV

  • And I've got no header for this file. So I don't want it to use the top row for the labels

  • So I'm going to say header equals

  • false and that's loaded in namedata and we can have a look so I'm gonna click on namedata here and if we click

  • On it you can see we've got the rows and the columns of our data in here

  • We can look at individual elements in this array so we can say data at position three four

  • right

  • And that's going to be the third row down and the fourth value across we can also leave one empty and just have an entire

  • row or

  • Conversely an entire column like this and so it's very easy to take ranges of values

  • You've got a huge table of data selecting certain columns looking at certain columns plotting certain columns

  • This is one of the reasons why R is very popular quite often when you're looking at data

  • We'll actually be looking at something called a data frame. Now a data frame. I've got a load one up is simply a

  • In essence a table of values, but it will have to be the same type

  • So in an array, normally they'll all be floats or they'll all be integers. In a data frame, there can be different things

  • So you could have first and last name next to age. For example

  • So I've just created a tiny little CSV file with some random people in it. So let's load this up

  • So I'm going to say namedata

  • assign read CSV

  • names dot

  • CSV and if I look at name data, you can see that it's got three columns

  • it's got first name surname and age and

  • Five rows and there's five people in this dataset and then you can do just like I did before but now we can also index

  • By the names of these columns so I could say I want all of the first names for example so I can say namedata

  • dollar

  • first-name and I can see

  • All the different first names so you can start to look at this data set and more in more detail, obviously

  • This isn't absolute tiny data set but you get the idea you could also look at individual instances

  • So we could say name data and I want just the second row for example name data the second row

  • There we go, Bill Jones and he's 18 years old as we move through these videos

  • It's going to be very common for us to load in

  • Datasets like this in this format and then start to process them based on these data frames. So perhaps an example, right?

  • so, so let's imagine you're an online retailer and someone comes into your shop and buy some things and maybe they you

  • Trying to understand what it is what they do so that you can let's say send them emails to try and get them to buy

  • More products or show them recommended products and things like this

  • So you want to try and build up a pattern of their behavior, right?

  • And all you've got is what they click on what they add to their basket and what they buy, right?

  • So you've learned that they're looking at these kinds of items and they look at these ones regularly

  • And then sometimes they just buy something completely random seemingly, and that goes in their basket and gets bought straight away

  • Maybe it's a present right? So maybe it's not tied to them as a person

  • So you're taking all of this data all of these purchases all of these?

  • Products are they're looking at and you're turning this into a kind of picture of this person and you're clustering that person in with other

  • consumers that bought similar things and trying to predict what they want to buy next, right?

  • And that's when you send them an email say you should look at this one because this one's really good and you didn't buy it

  • Last time but you'll definitely want to buy it this time. So we've got some data we want to extract some knowledge

  • What's the first thing we do?

  • well

  • We have to start to look at it and try and tease out some kind of information

  • Right or analyze this data the data analysis is the idea of using statistical measures to try and work out what's going on

  • This is kind of a cycle. We're going to analyze the data

  • So we're going to do a data analysis and perhaps sometimes just using statistics to analyze the data isn't enough

  • You can't really learn everything about it

  • Yes, you can learn, you know, mathematically how it works, but you might not understand about what it all means

  • So visualizing the data can be really helpful. So what we'll also do is we'll visualize the data

  • Visualization so that's going to be charting it plotting it trying to work out

  • trends and

  • Links between different variables and things like this and these are kind of being back and forth

  • Right, you could do both of these things numerous times and work out what we've got, right?

  • So you're gonna do something like this. And then what we're going to do is we're going to pre-process the data

  • Often you'll be finding your recording much more data than you actually need. Right. This is certainly true of an online shop

  • I'm going to be looking at a lot of products

  • But I don't end up buying and I was never really going to buy I know maybe a pipe dream and they've got a sort

  • Of weed out this information to work out what it is that they might actually better convince me to buy right?

  • So this is going to you going to preprocess data and remove a nonsense and drill right down to the stuff that's really useful

  • So this is pre-processing and this is going to be a kind of cycle of analysis and visualization

  • and

  • Pre-processing and we can repeat these things and then we can really drill down and whittle down our data into the most usable sort of

  • Core of knowledge that we can

  • And get the most out of it. Now it may be that just analysing the data is enough, right?

  • You've now sort of you've obtained some knowledge

  • You kind of understand what the trends are and maybe that was all you wanted to do. That's sometimes the case

  • Maybe actually what we want to do is take things a little bit further

  • We're going to use machine learning or modeling to try and model this system and predict what's going to happen next?

  • So for example in the case of an online shop

  • We might want to start predicting what people are going to buy next and if we can do that

  • That's when we can send out these emails or flag things in their recommended items and get many more sales as an example

  • Let's imagine that someone has spent a lot of time looking at DIY tools right. I've you know recently moved house

  • I spent a lot of time doing DIY and I'm always trying to buy new tools because it just seems like a good idea

  • So, you know, maybe I buy a certain kind of saw and then you know a few months later. They're starting to recommend me

  • a slightly different kind of saw

  • that serves a slightly different purpose that suddenly I definitely need to be doing and I think another yeah

  • Maybe I will buy that and then the end I have 10 saws and I don't know how to use any of the saws

  • But you know, the retailers job is done

  • It's if we want to extract this data

  • We're going to use machine learning or modeling to put to model this system and make predictions right now

  • So for example, we could cluster the data together. We could link my purchase history with similar people. What are they buying?

  • Can I be tempted to buy those things as well, right?

  • Maybe I'm very different from someone else

  • And so it's not a good idea to recommend me certain products because I'm unlikely to buy those things

  • Perhaps use a different example in the medical domain

  • It's quite common to classify people into kind of risk categories, right so that we can maybe use preventative treatments

  • So every time I go to a doctor they're going to collect data on me on what I can't cope

  • What's currently one with me? And what was wrong with me before and?

  • Combine that with with you know standard data

  • like how much exercise someone does and you know their family history and

  • How what their stress levels are and things like this?

  • We can combine all these things to make a prediction as to what they were at risk of in the future

  • So, you know heart disease or something else like this. It could save someone's life

  • If you spot that they're at risk of a certain thing and you can really advise that person to you know

  • Increase their level of exercise or alter their diet. There are two other terms that we come across, you know a lot, right?

  • So there's data mining and big data right now

  • I'm not really sure what data mining is because I don't think anyone is it's a bit. It's a bit of a buzzword

  • Really what data mining is is a combination of pre-processing your data and maybe using clustering to extract some knowledge from it, right?

  • So that's our sort of it's a word that's come to be used in place of those things, right?

  • If someone says they're doing data mining, that's what they're doing. They're pre-processing and extracting some knowledge from their data

  • It's a night it's a cool sounding word. You're not actually mining anything, right?

  • you're just doing what everyone else does on data. Big data is the idea that maybe we've collected a lot of examples of

  • something

  • You know a huge number or each of our examples is quite complicated and it has a lot of variables right in that case

  • The amount of data we've got is sort of unwieldy, right?

  • So I would argue perhaps that big data is not data that you can run on your laptop like you might be using cloud compute

  • Infrastructure or certainly parallel processing in some way to to pre-process and analyze this data

  • Right so exactly where the line, how big is big. I don't know but exactly where we draw the line in some ways

  • It's not really important, right the idea is just that

  • The amount of data we as a species are now producing more and more of our data is becoming big data

  • But you know exactly where the cutoff is isn't it's not doesn't really matter

  • What is data right? I'm pretty sure that's data

  • Right is this data? this picture or that data

  • Is this data? What what is data?

Okay, so artificial intelligence machine learning data mining data analysis

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

數據分析0:數據分析簡介--電腦愛好者篇。 (Data Analysis 0: Introduction to Data Analysis - Computerphile)

  • 15 2
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字