Placeholder Image

字幕列表 影片播放

  • Did you know that US retail giant Walmart generates 2.5 petabytes of data from approximately

  • 1 million customers every hour? And in case youre wondering how much is

  • a petabyte, as I did when I first read this, it is equal to 1 million gigabytes. The equivalent

  • of 13.3 years of HD video. Considering that Walmart locations are open

  • for business for more than 10 hours a day, we get a staggering 130 years of HD video

  • and 25 petabytes of data collected on a daily basis!

  • Yes, there aren’t many companies like Walmart. But even smaller enterprises nowadays generate

  • huge amounts of data, so, it becomes increasingly more challenging to take advantage of such

  • information abundance. And yes, data science is at the heart of all

  • that. But before we can apply data science, we must do justice to another crucial player

  • the cloud and cloud computing in general. That’s exactly what we will focus on in

  • this video: Why cloud computing is essential for data science in the 2020s.

  • But before we continue, let me tell you about something else weve put together:

  • Weve createdThe 365 Data Science Programto help people enter the field of data science,

  • regardless of their background. We have trained more than 350,000 people around the world

  • and are committed to continue doing so. If you are interested to learn more, follow the

  • link in the description. It will also give you 20% off all plans if you want to start

  • learning from an all-around data science training. Now, back to cloud computing.

  • To understand the advantages cloud computing provides when it comes to data science, let’s

  • imagine a world with as much data as we have today, but without servers.

  • In such an unfortunate scenario, firms would need databases that run locally, right?

  • So, every time when you, as a data scientist, want to engage in new analyses or refresh

  • an existing algorithm, you’d have to transfer information to your machine from the central

  • database, and then proceed to operate locally. This unfortunate world would have several

  • main drawbacks: Manual intervention would be necessary to

  • retrieve data Your machine becomes a single point of failure

  • for the analyses you have worked on locally Processing speed would be equivalent to the

  • computing power of your computer Chances are you will be able to work with

  • a limited amount of data due to the limited computing resources at your disposal

  • Moreover, under this setup, you wouldn’t be able to leverage real-time data to build

  • recommender systems or any type of machine learning algorithms that requirelive

  • data Doesn’t sound like the perfect scenario,

  • does it? Well, that’s why we invented servers. And

  • then these servers had drawbacks of their own.

  • The most obvious one is that a server needs space to be stored. A Cloud is basically somebody

  • else’s server, so their storage problem Server infrastructure is expensive to buy

  • and set up. Cloud infrastructure is already there and is simply awaiting your server consumption

  • In-house data storing requires you to have backups and ideallyhave them in different

  • locations. Clouds offer data everywhere, anytime, usually backed up on many different servers

  • across the world Servers need planning. For fast-growing companies,

  • server needs could be unpredictable even for the current quarter. With in-house servers,

  • you usually end up buying more servers than you actually need at a given time. With cloud

  • you pay as much as you use. You see my point, right?

  • Fortunately, we now have clouds. They overshadow local servers in almost every conceivable

  • aspect. And, in fact, data scientists should be focused on developing great algorithms,

  • testing hypothesis, taking advantage of all available data without having to wait hours

  • to see the results of the tests they are performing and certainly without having to worry how

  • much memory space they have left on their computer. And yes, sometimes data scientists

  • do end up waiting for long hours for an algorithm to train, but with a cloud, they have the

  • option to pay more and get the job done faster. That’s yet another advantage of cloud computing

  • over servers. That being said, the biggest winners are smaller

  • entities, as they get cheap access to the same tools as enormous corporations. And this

  • is why cloud technologies are a huge enabler. They create a level playing field and allow

  • small players to compete with much bigger ones.

  • If you think about it, this technological progress changed a number of businesses in

  • a way similar to how the Internet changed commerce.

  • Remember when, all of a sudden, people around the world were able to open e-commerce stores

  • and compete on a global scale with the established firms?

  • Well, in the same way, cloud technologies democratized data analysis and data science.

  • The fact that data scientists and data analysts can rely on data stored on the cloud truly

  • makes their life so much easier! In addition, most cloud providers allow data

  • scientists to access readily installed open-source frameworks right away. This is not only super

  • convenient but can also be a huge time saver. Alternatively, if you wanted to use Apache

  • Spark in the conventional way you would have to:

  • Start by installing java, • Then continue by installing Scala

  • After which youll be able to download Apache Spark and install it.

  • That’s the setup you need to go through if you are working on your own pc. However,

  • if you are using a cloud service, youll be able to start working with the Apache Spark

  • framework right away! Yep, it’s been already installed for you. The same is valid for many

  • different open-source frameworks. This type of easy-to-access, easy-to-use infrastructure

  • is very attractive and potentially applies to all sorts of applications data analysts

  • and data scientists use in their work. Over the last few years, Amazon Web Services,

  • Microsoft Azure, and Google Cloud have tried to boost their cloud services in terms of

  • capability to run machine learning algorithms. The Big 3 of cloud services focused on this

  • area extensively, as they realized it could be an important source of competitive advantage

  • in the long run. And, in case youre wondering, one of the biggest sell points of cloud machine

  • learning is that it allows small and medium enterprises to access a machine learning infrastructure

  • they otherwise wouldn’t be able to afford. For example, thanks to cloud-based machine

  • learning, a small e-commerce retailer could run a real-time recommender system algorithm

  • to improve the product offering shown to customers based on the products they have already added

  • to their cart. In this type of business, every website click can be interpreted as a particular

  • type of intention and signal, and hence the real-time updated algorithm operating in the

  • cloud will be able to make a suggestion that improves the chances of making a conversion

  • and maximizing revenues. Without cloud-based machine learning, setting

  • up the necessary infrastructure to perform this type of analysis would be really costly

  • and difficult to execute for small and medium enterprises.

  • It is still unclear who will win the cloud war between giants like AWS, Microsoft Azure,

  • and Google Cloud. But one thing is certain. This is a service that benefits greatly small

  • and medium-sized businesses, enabling them to level the playing field when competing

  • against large multinationals with superior IT infrastructure.

  • If you liked this video, don’t forget to give it a like, or a share!

  • And if data science is what you’d like to learn more about, subscribe to our channel

  • - youll find plenty of data science insights and data science career advice.

  • Thanks for watching!

Did you know that US retail giant Walmart generates 2.5 petabytes of data from approximately

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

為什麼雲計算對數據科學家至關重要? (Why Cloud Computing is Critical for a Data Scientist)

  • 14 2
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字