字幕列表 影片播放 列印英文字幕 Did you know that US retail giant Walmart generates 2.5 petabytes of data from approximately 1 million customers every hour? And in case you’re wondering how much is a petabyte, as I did when I first read this, it is equal to 1 million gigabytes. The equivalent of 13.3 years of HD video. Considering that Walmart locations are open for business for more than 10 hours a day, we get a staggering 130 years of HD video and 25 petabytes of data collected on a daily basis! Yes, there aren’t many companies like Walmart. But even smaller enterprises nowadays generate huge amounts of data, so, it becomes increasingly more challenging to take advantage of such information abundance. And yes, data science is at the heart of all that. But before we can apply data science, we must do justice to another crucial player – the cloud and cloud computing in general. That’s exactly what we will focus on in this video: Why cloud computing is essential for data science in the 2020s. But before we continue, let me tell you about something else we’ve put together: We’ve created ‘The 365 Data Science Program’ to help people enter the field of data science, regardless of their background. We have trained more than 350,000 people around the world and are committed to continue doing so. If you are interested to learn more, follow the link in the description. It will also give you 20% off all plans if you want to start learning from an all-around data science training. Now, back to cloud computing. To understand the advantages cloud computing provides when it comes to data science, let’s imagine a world with as much data as we have today, but without servers. In such an unfortunate scenario, firms would need databases that run locally, right? So, every time when you, as a data scientist, want to engage in new analyses or refresh an existing algorithm, you’d have to transfer information to your machine from the central database, and then proceed to operate locally. This unfortunate world would have several main drawbacks: Manual intervention would be necessary to retrieve data Your machine becomes a single point of failure for the analyses you have worked on locally Processing speed would be equivalent to the computing power of your computer Chances are you will be able to work with a limited amount of data due to the limited computing resources at your disposal Moreover, under this setup, you wouldn’t be able to leverage real-time data to build recommender systems or any type of machine learning algorithms that require ‘live’ data Doesn’t sound like the perfect scenario, does it? Well, that’s why we invented servers. And then these servers had drawbacks of their own. The most obvious one is that a server needs space to be stored. A Cloud is basically somebody else’s server, so their storage problem Server infrastructure is expensive to buy and set up. Cloud infrastructure is already there and is simply awaiting your server consumption In-house data storing requires you to have backups and ideally – have them in different locations. Clouds offer data everywhere, anytime, usually backed up on many different servers across the world Servers need planning. For fast-growing companies, server needs could be unpredictable even for the current quarter. With in-house servers, you usually end up buying more servers than you actually need at a given time. With cloud – you pay as much as you use. You see my point, right? Fortunately, we now have clouds. They overshadow local servers in almost every conceivable aspect. And, in fact, data scientists should be focused on developing great algorithms, testing hypothesis, taking advantage of all available data without having to wait hours to see the results of the tests they are performing and certainly without having to worry how much memory space they have left on their computer. And yes, sometimes data scientists do end up waiting for long hours for an algorithm to train, but with a cloud, they have the option to pay more and get the job done faster. That’s yet another advantage of cloud computing over servers. That being said, the biggest winners are smaller entities, as they get cheap access to the same tools as enormous corporations. And this is why cloud technologies are a huge enabler. They create a level playing field and allow small players to compete with much bigger ones. If you think about it, this technological progress changed a number of businesses in a way similar to how the Internet changed commerce. Remember when, all of a sudden, people around the world were able to open e-commerce stores and compete on a global scale with the established firms? Well, in the same way, cloud technologies democratized data analysis and data science. The fact that data scientists and data analysts can rely on data stored on the cloud truly makes their life so much easier! In addition, most cloud providers allow data scientists to access readily installed open-source frameworks right away. This is not only super convenient but can also be a huge time saver. Alternatively, if you wanted to use Apache Spark in the conventional way you would have to: • Start by installing java, • Then continue by installing Scala • After which you’ll be able to download Apache Spark and install it. That’s the setup you need to go through if you are working on your own pc. However, if you are using a cloud service, you’ll be able to start working with the Apache Spark framework right away! Yep, it’s been already installed for you. The same is valid for many different open-source frameworks. This type of easy-to-access, easy-to-use infrastructure is very attractive and potentially applies to all sorts of applications data analysts and data scientists use in their work. Over the last few years, Amazon Web Services, Microsoft Azure, and Google Cloud have tried to boost their cloud services in terms of capability to run machine learning algorithms. The Big 3 of cloud services focused on this area extensively, as they realized it could be an important source of competitive advantage in the long run. And, in case you’re wondering, one of the biggest sell points of cloud machine learning is that it allows small and medium enterprises to access a machine learning infrastructure they otherwise wouldn’t be able to afford. For example, thanks to cloud-based machine learning, a small e-commerce retailer could run a real-time recommender system algorithm to improve the product offering shown to customers based on the products they have already added to their cart. In this type of business, every website click can be interpreted as a particular type of intention and signal, and hence the real-time updated algorithm operating in the cloud will be able to make a suggestion that improves the chances of making a conversion and maximizing revenues. Without cloud-based machine learning, setting up the necessary infrastructure to perform this type of analysis would be really costly and difficult to execute for small and medium enterprises. It is still unclear who will win the cloud war between giants like AWS, Microsoft Azure, and Google Cloud. But one thing is certain. This is a service that benefits greatly small and medium-sized businesses, enabling them to level the playing field when competing against large multinationals with superior IT infrastructure. If you liked this video, don’t forget to give it a like, or a share! And if data science is what you’d like to learn more about, subscribe to our channel - you’ll find plenty of data science insights and data science career advice. Thanks for watching!
B1 中級 為什麼雲計算對數據科學家至關重要? (Why Cloud Computing is Critical for a Data Scientist) 11 2 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字