Placeholder Image

字幕列表 影片播放

  • Data science is not about making complicated models. It's not about making awesome visualizations

  • It's not about writing code data science is about using data to create as much impact as possible for your company

  • Now impact can be in the form of multiple things

  • It could be in the form of insights in the form of data products or in the form of product recommendations for a company

  • Now to do those things, then you need tools like making complicated models or data visualizations or writing code

  • But essentially as a data scientist

  • your job is to solve real company problems using data and what kind of tools you use we don't care

  • Now there's a lot of misconception about data science, especially on YouTube

  • and I think the reason for this is because there's a huge misalignment between

  • what's popular to talk about and what's needed in the industry. So because of that I want to make things clear. I

  • am a data scientist working for a GAFA company and

  • those companies really emphasize on using data to improve their products

  • So this is my take on what is data science

  • Before data science, we popularized the term data mining in an article called from data mining to knowledge discovery in databases in

  • 1996 in which it referred to the overall process of discovering useful information from data

  • In 2001, William S. Cleveland wanted to bring data mining to another level

  • He did that by combining computer science with data mining

  • Basically

  • He made statistics a lot more technical which he believed would expand the possibilities of data mining and produce a powerful force for innovation

  • Now you can take advantage of compute power for statistics and he called this combo data science. Around this time

  • this is also when web 2.0 emerged where websites are no longer just a digital pamphlet, but a medium for a shared experience

  • amongst millions and millions of users

  • These are web sites like MySpace in 2003

  • Facebook in 2004 and YouTube in 2005. We can now interact with these web sites

  • meaning we can contribute post comment like upload share

  • leaving our footprint in the digital landscape we call Internet and help create and shape the ecosystem

  • we now know and love today. And guess what?

  • That's a lot of data so much data, it became too much to handle using traditional technologies. So we call this Big Data.

  • That opened a world of possibilities in finding insights using data

  • But it also meant that the simplest questions require sophisticated data infrastructure just to support the handling of the data

  • We needed parallel computing technology like MapReduce, Hadoop, and Spark

  • so the rise of big data in

  • 2010 sparked the rise of data science to support the needs of the businesses to draw insights from their massive unstructured data sets

  • So then the journal of data science described data science as almost everything that has something to do with data

  • Collecting analyzing modeling. Yet the most important part is its applications. All sorts of applications.

  • Yes, all sorts of applications like machine learning

  • So in 2010 with the new abundance of data

  • it made it possible to train machines with a data-driven approach

  • rather than a knowledge driven approach. All the theoretical papers about recurring neural networks support vector machines became feasible

  • Something that can change the way we live and how we experience things in the world

  • Deep learning is no longer an academic concept in these thesis paper

  • It became a tangible useful class of machine learning that would affect our everyday lives

  • So machine learning and AI dominated the media overshadowing every other aspect of data science

  • like exploratory analysis,

  • experimentation, ... And skills we traditionally called business intelligence

  • So now the general public think of data science as

  • researchers focused on machine learning and AI but the industry is hiring data scientists as analysts

  • So there's a misalignment there

  • The reason for the misalignment is that yes, most of these data scientists can probably work on more technical problems

  • but big companies like Google Facebook Netflix have so many low-hanging fruits to improve their products that they don't require any

  • advanced machine learning or

  • statistical knowledge to find these impacts in their analysis

  • Being a good data scientist isn't about how advanced your models are

  • It's about how much impact you can have with your work. You're not a data cruncher. You're a problem solver

  • You're strategists. Companies will give you the most ambiguous and hard problems. And we expect you to guide the company to the right direction

  • Ok, now I want to conclude with real-life examples of data science jobs in Silicon Valley

  • But first I have to print some charts. So let's go do that

  • (conversation not directly related to the topic)

  • (conversation not directly related to the topic)

  • So this is a very useful chart that tells you the needs of data science. Now, it's pretty obvious

  • but sometimes we kind of forget about it now

  • At the bottom of the pyramid we have collect you obviously have to collect some sort of data to be able to use that data

  • So collect storing transforming all of these data engineering effort is pretty important and it's actu-

  • It's actually quite captured pretty well in media because of big data we talked about how difficult it is to manage all this data

  • We talked about parallel computing which means like Hadoop and Spark

  • Stuff like that. We know about this. Now the thing that's less known is the stuff in between which is right here

  • everything that's here and

  • Surprisingly this is actually one of the most important things for companies because you're trying to tell the company

  • what to do with your product. So what do I mean by that? So I'm an analytics that tells you

  • using the data what kind of insights can tell me what are happening to my users and then metrics this is important because

  • what's going on with my product?

  • You know, these metrics will tell you if you're successful or not. And then also, you know a be testing of course

  • Experimentation that allows you to know, which product versions are the best

  • So these things are actually really important but they're not so covered in media. What's covered in media

  • is this part. AI, deep learning. We've heard it on and on about it, you know

  • But when you think about it for a company, for the industry,

  • It's actually not the highest priority or at least it's not the thing that yields the most result for the lowest amount of effort

  • That's why AI deep learning is on top of the hierarchy of needs and these things may be testing analytics

  • they're actually way more important for industry

  • so that's why we're hiring a lot of data scientists that does that. So what do data scientists actually do?

  • Well that depends on the company because of them as of the size

  • So for a start-up you kind of lack resources

  • So you can only kind of have one DS. So that one data scientist

  • he has to do everything. So you might be seeing all all this

  • being data scientists. Maybe you won't be doing AI or deep learning because that's not a priority right now

  • But you might be doing all of these. You have to set up the whole data infrastructure

  • You might even have to write some software code to add logging and then you have to do the analytics

  • yourself, then you have to build the metrics yourself, and you have to do A/B testing yourself. That's why

  • for startups if they need a data scientist this whole thing is

  • data science, so that means you have to do everything. But let's look at medium-sized companies. Now, finally

  • they have a lot more resources. They can separate the data engineers and the data scientists

  • So usually in collection, this is probably software engineering. And

  • then here, you're gonna have data engineers doing this. And then depending if you're medium-sized company does a lot of

  • recommendation models or stuff that requires AI, then DS will do all these

  • Right. So as a data scientist, you have to be a lot more technical

  • That's why they only hire people with PhDs or masters because they want you to be able to do the more complicated things

  • So let's talk about large company now

  • Because you're getting a lot bigger

  • you probably have a lot more money and then you can spend it more on employees

  • So you can have a lot of different employees working on different things. That way

  • the employee does not need to think about this stuff that they don't want to do and they could focus on the things that they're

  • best at. For example, me and my untitled large company

  • I would be in analytics so I could just focus my work on analytics and metrics and stuff like that

  • So I don't need to worry about data engineering or AI deep learning stuff

  • So here's how it looks for a large company

  • Instrumental logging sensors. This is all handled by software engineers

  • Right? And then here, cleaning and building data pipelines

  • This is for data engineers. Now here, between these two things, we have Data

  • Science Analytics. That's what it's called

  • But then once we go to the AI and deep learning, this is where we have

  • research scientists

  • or we call it data science core

  • and they are backed by and now engineers which are machine learning engineers. Yeah

  • Anyways, so in summary, as you can see, data science can be all of this and it depends what company you are in And

  • the definition will vary. So please let me know what you would like to learn more about AI deep learning, or A/B testing,

  • experimentation,... Depending on what you want to learn about

  • leave a comment down below so I could talk about it or I could find someone who knows about this and I can share the

  • insights with you

  • So yeah, if you like this video, don't forget to like and subscribe

  • So, yeah. Hope you have a wonderful day. Hope this was helpful. But yeah, thanks for watching

  • Peace.

Data science is not about making complicated models. It's not about making awesome visualizations

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級 美國腔

什麼才是真正的數據科學?由數據科學家講述 (What REALLY is Data Science? Told by a Data Scientist)

  • 51 2
    安東尼 發佈於 2021 年 01 月 14 日
影片單字