Placeholder Image

字幕列表 影片播放

由 AI 自動生成
  • For many years, data science has been called the sexiest job of the 21st century.

    多年來,數據科學一直被稱為 21 世紀最性感的工作。

  • But in recent years, it seems like there's a new job vying for that title, the AI engineer.

    但近年來,似乎有一種新的工作在爭奪這個頭銜,那就是人工智能工程師。

  • So who even are these new kids on the block?

    那麼,這些新來的孩子到底是誰?

  • Are they just data scientists in disguise?

    他們只是偽裝的數據科學家嗎?

  • What's up y'all? I'm Isaac Key, and I'm a former data scientist turned AI engineer at IBM.

    你們好嗎?我叫艾薩克-基(Isaac Key),曾是一名數據科學家,現在是 IBM 的人工智能工程師。

  • To answer these questions,

    為了回答這些問題

  • I'm going to lay out four key areas in which the work of a data scientist differs from an AI engineer, specifically a generative AI engineer.

    下面我將闡述數據科學家的工作與人工智能工程師(特別是生成式人工智能工程師)的四個關鍵領域的不同之處。

  • But before I dive into these differences, we first have to understand more about what's happening in the industry.

    不過,在深入探討這些差異之前,我們首先要了解一下行業內正在發生的事情。

  • So traditionally, data scientists have always used AI models to do their analysis.

    是以,傳統上,數據科學家總是使用人工智能模型來進行分析。

  • So what's changed? Well, with the advent of generative AI, the boundaries of what AI can do are being pushed in ways that we've never seen before.

    那麼,有什麼變化呢?嗯,隨著生成式人工智能的出現,人工智能所能做的事情正在以我們從未見過的方式不斷向前推進。

  • So these breakthroughs have been so groundbreaking, that generative AI has split off into its own distinct field, and we call that AI engineering.

    是以,這些突破是如此具有開創性,以至於生成式人工智能已經分裂成一個獨立的領域,我們稱之為人工智能工程。

  • Okay. So now that we understand the landscape, let's dive into the differences.

    好了,既然我們已經瞭解了這些情況,那就來看看它們之間的區別吧。

  • The first area of difference lies in the use cases.

    第一個不同之處在於使用案例。

  • So at a very high level, think of a data scientist as a data storyteller.

    是以,從很高的層面上講,可以把數據科學家看作是數據故事的講述者。

  • They take massive amounts of messy real-world data, and they use mathematical models to translate this data into insights.

    他們獲取大量雜亂無章的真實世界數據,並使用數學模型將這些數據轉化為洞察力。

  • On the other hand, think of an AI engineer as an AI system builder.

    另一方面,將人工智能工程師視為人工智能系統的構建者。

  • They use foundation models to build generative AI systems that help to transform business processes.

    他們利用基礎模型建立生成式人工智能系統,幫助改造業務流程。

  • So since data scientists are fantastic storytellers, they use a lot of descriptive analytics to describe the past.

    是以,由於數據科學家是出色的說書人,他們會使用大量描述性分析來描述過去。

  • One example of this is through what's called Exploratory Data Analysis or EDA, which is all about graphing the data and doing statistical inference.

    其中一個例子就是所謂的 "探索性數據分析"(Exploratory Data Analysis)或 "EDA",即繪製數據圖表並進行統計推斷。

  • They can also do this through what's called clustering, which group similar data points based off of similar characteristics such as say doing customer segmentation.

    他們還可以通過所謂的聚類來做到這一點,即根據相似的特徵對相似的數據點進行分組,例如進行客戶細分。

  • Now, every good story has a reader trying to figure out what's going to come next, and that's where predictive use cases comes in.

    現在,每個好故事都會讓讀者想知道接下來會發生什麼,這就是預測性用例的作用所在。

  • As opposed to a book, however, a data scientist does not have the end already written, so they have to use what are called machine learning models to make their predictions.

    然而,與書本不同的是,數據科學家並沒有已經寫好的結局,是以他們必須使用所謂的機器學習模型來進行預測。

  • An example of this is called regression models, which predict a numeric value such as say a temperature or revenue.

    其中一個例子就是所謂的迴歸模型,它可以預測一個數值,比如溫度或收入。

  • Another type of these models are classification models, which predict a categorical value such as a success or a failure.

    這類模型的另一種類型是分類模型,預測成功或失敗等分類值。

  • So putting on the AI engineering hat now, one of the main use cases that AI engineers work on are called prescriptive use cases, which are all about choosing the best course of action.

    是以,現在戴上人工智能工程學的帽子,人工智能工程師所研究的主要用例之一就是所謂的 "規範性用例",即選擇最佳行動方案。

  • An example of this is a technique called decision optimization, which enables businesses to assess a set of possible actions and then choose the most optimal path based off a set of requirements or standards.

    這方面的一個例子是一種稱為決策優化的技術,它使企業能夠評估一系列可能的行動,然後根據一系列要求或標準選擇最優路徑。

  • Another example of a prescriptive use case is through creating what are called recommendation engines.

    規範性用例的另一個例子是創建所謂的推薦引擎。

  • As an example, this can involve suggesting targeted marketing campaigns for a select customer base.

    例如,這可能涉及到為特定客戶群建議有針對性的營銷活動。

  • In addition to prescriptive use cases, there are also generative use cases, hence the name generative AI.

    除了規範性用例外,還有生成性用例,是以被稱為生成性人工智能。

  • Now, foundation models, which I will touch on more in a bit, enable the creation of what are called intelligent assistants.

    現在,基礎模型(我稍後會詳細介紹)可以創建所謂的智能助手。

  • For example, a coding assistant or a digital advisor.

    例如,編碼助理或數字顧問。

  • They also enable the creation of chatbots, as an example.

    例如,它們還能創建哈拉機器人。

  • Which enable conversational search through information retrieval and the summarization of various content.

    通過信息檢索和對各種內容的總結,實現對話式搜索。

  • So after we have a use case identified, we need data.

    是以,在確定了用例之後,我們需要數據。

  • Now, people say that data is a new oil because like oil, you have to search for and find the right data and then use the right processes to transform it into various products, which then power various processes.

    現在,人們說數據是一種新的石油,因為就像石油一樣,你必須搜索和找到正確的數據,然後使用正確的流程將其轉化為各種產品,然後為各種流程提供動力。

  • For a data scientist, the oil of choice is often structured data, aka tabular data.

    對於數據科學家來說,首選的石油通常是結構化數據,也就是表格數據。

  • Do note that data scientists still work with unstructured data, but not as much as AI engineers.

    請注意,數據科學家仍在處理非結構化數據,但不如人工智能工程師處理得多。

  • Now, these tables are often in the order of hundreds to hundreds of thousands of observations.

    現在,這些表格通常包含數百到數十萬個觀測值。

  • They require a lot of cleaning and pre-processing before the data can be modeled.

    在對數據進行建模之前,需要對數據進行大量的清理和預處理。

  • Some of the cleaning involved, for example, involves removing outliers or joining and filtering on a new table or even creating new features altogether.

    例如,有些清理工作涉及刪除異常值或在新表上進行連接和過濾,甚至完全創建新的特徵。

  • This clean data is then used to train various machine learning models.

    然後,這些乾淨的數據將用於訓練各種機器學習模型。

  • Now, on the other hand, an AI engineer, for them, the oil of choice is mainly unstructured data, such as text, images, videos, audio files, etc.

    另一方面,對於人工智能工程師來說,他們選擇的 "石油 "主要是非結構化數據,如文本、影像、視頻、音頻文件等。

  • Let's take a text-based foundation model called an LLM or large language model as an example.

    讓我們以基於文本的基礎模型 LLM 或大型語言模型為例。

  • These models require anywhere between billions to trillions of tokens of text to be trained on, which is a lot larger scale compared to traditional machine learning models.

    這些模型需要對數十億到數萬億的文本進行訓練,與傳統的機器學習模型相比,規模要大得多。

  • This leads me to the next area of difference, which is the underlying models.

    這就引出了下一個差異領域,即基礎模型。

  • So the data science toolbox consists of hundreds of different models and different algorithms that they can choose from.

    是以,數據科學工具箱由數百種不同的模型和不同的算法組成,供他們選擇。

  • Due to the nature of these models, each different use case requires gathering a different data set, and thus requires training a different model.

    由於這些模型的性質,每種不同的使用情況都需要收集不同的數據集,從而需要訓練不同的模型。

  • So as a result, the scope of these individual models is a lot more narrow, meaning that it's harder for them to generalize past the domain of data that they've been trained on.

    是以,這些單個模型的範圍就會狹窄很多,這意味著它們很難超越它們所訓練的數據領域。

  • Generally speaking, these models are a lot smaller in size in terms of the number of parameters.

    一般來說,這些模型的參數數量要少得多。

  • They take less compute power to train and do inference, and they require less time to train, anywhere between seconds to hours.

    它們訓練和推理所需的計算能力更低,訓練所需的時間也更短,從幾秒到幾小時不等。

  • Now, on the other hand, the generative AI toolbox is a lot less cluttered, and it really only contains one type of model, and that is called the foundation model.

    另一方面,生成式人工智能工具箱就沒那麼雜亂了,它實際上只包含一種模型,即基礎模型。

  • Now, foundation models are revolutionary because they allow for one single type of model to generalize to a wide range of tasks without having to be retrained.

    現在,基礎模型具有革命性的意義,因為它們允許一種單一類型的模型通用於各種任務,而無需重新訓練。

  • Thus, their scope is called more wide.

    是以,它們的範圍被稱為更加廣泛。

  • Due to the sophistication of these models, they are a lot larger in size, often billions of parameters.

    由於這些模型的複雜性,它們的規模要大得多,通常需要數十億個參數。

  • They require a lot more compute power to train.

    它們需要更多的計算能力來進行訓練。

  • We're talking hundreds to thousands of GPUs, and they require a lot more training time.

    我們說的是成百上千的 GPU,它們需要更多的訓練時間。

  • Now, we're talking anywhere between weeks to months.

    現在,我們說的是幾周到幾個月的時間。

  • Due to the differences in the intrinsic nature between traditional machine learning models and foundation models, this also means that the underlying processes and techniques that are used to develop solutions with these also differ.

    由於傳統機器學習模型和基礎模型的內在性質不同,這也意味著使用這些模型開發解決方案的基本流程和技術也不同。

  • So, a typical data science process will look something like this.

    是以,典型的數據科學流程是這樣的。

  • You start off with a use case, and then from that use case, you pick the right data.

    首先要有一個用例,然後根據用例選擇合適的數據。

  • Then, after that data is prepared, you use it to train and validate a model using techniques such as feature engineering, cross-validation, or hyperparameter tuning, as an example.

    然後,在準備好數據後,使用特徵工程、交叉驗證或超參數調整等技術對模型進行訓練和驗證。

  • This model then is deployed at some endpoint, for example, in the Cloud to do real-time prediction and inference.

    然後將該模型部署到某個端點,例如雲端,進行實時預測和推理。

  • Now, on the other hand, the generative AI process also starts off with a use case, but then we can skip directly to working with a pre-trained model.

    另一方面,生成式人工智能過程也是從用例開始的,但我們可以直接跳過用例,使用預先訓練好的模型。

  • What makes this possible is a phenomenon called AI democratization, which is a big fancy word that simply means making AI more widely accessible to everyday users.

    這種現象被稱為人工智能民主化,這是一個花哨的大詞,簡單地說就是讓人工智能更廣泛地為普通用戶所用。

  • Some of the best foundation models out there are published to open source communities such as Hugging Face.

    一些最好的基礎模型已經發布到開源社區,如 Hugging Face。

  • Since these models are so generalizable and so powerful out of the box, they make it easy for developers to get started.

    由於這些模型具有很強的通用性和開箱即用的強大功能,是以開發人員很容易上手。

  • AI engineers interact with these foundation models via natural language instructions to prompt them to do various tasks.

    人工智能工程師通過自然語言指令與這些基礎模型互動,促使它們完成各種任務。

  • This process is known as prompt engineering.

    這一過程被稱為及時工程。

  • Now, prompt engineering can be used in conjunction with different frameworks to then build larger AI systems.

    現在,提示工程可以與不同的框架結合使用,進而構建更大的人工智能系統。

  • An example of these frameworks include as one, chaining different prompts together or doing what's called parameter-efficient fine-tuning or PEFT on domain-specific data, or doing retrieval augmented generation, aka RAG, to ground answers and truth, or even by creating autonomous agents to reason through very complex multi-step problems.

    這些框架的一個例子是將不同的提示串聯在一起,或者在特定領域的數據上進行所謂的參數高效微調(PEFT),或者進行檢索增強生成(又稱 RAG),以確定答案和真相,甚至通過創建自主代理來推理非常複雜的多步驟問題。

  • So these are just a few of the examples of the building blocks that can be used to build larger AI applications.

    是以,這些只是可用於構建大型人工智能應用的構件中的幾個例子。

  • The last step is to then embed the AI in a larger system or workflow.

    最後一步是將人工智能嵌入到更大的系統或工作流程中。

  • This can take on the form of creating assistants or virtual agents, building a larger application with a UI, or even doing some sort of automation.

    其形式可以是創建助理或虛擬代理、構建具有用戶界面的大型應用程序,甚至是實現某種自動化。

  • So, okay, let's take a step back and let's look at all the differences at a very high level.

    那麼,好吧,讓我們退一步,從更高的層面來看看所有的不同之處。

  • As we can see, the breakthroughs in generative AI underpin many of the differences in the use cases, data, models, and processes that data scientists and AI engineers work on.

    正如我們所看到的,生成式人工智能的突破是數據科學家和人工智能工程師在使用案例、數據、模型和流程方面許多差異的基礎。

  • It's important to note that there is still overlap between the two fields.

    值得注意的是,這兩個領域之間仍然存在重疊。

  • For example, data scientists will still work on prescriptive use cases or an AI engineer will still work with structured data.

    例如,數據科學家仍將處理規範性用例,人工智能工程師仍將處理結構化數據。

  • Regardless of these differences, both of these fields are continuing to evolve at a blazing fast pace with new research papers, new models, new tools coming out every single day.

    無論這些差異如何,這兩個領域都在以驚人的速度不斷髮展,每天都有新的研究論文、新的模型和新的工具問世。

  • With data, AI, and a creative mind, really anything is possible with these.

    有了數據、人工智能和創造性思維,一切皆有可能。

  • Thank you for tuning in. I hope this was helpful.

    感謝您的收聽。希望對您有所幫助。

  • Until next time, peace.

    下次再見,和平。

  • If you like this video and want to see more like it, please like and subscribe.

    如果您喜歡這段視頻並希望看到更多類似內容,請點贊並訂閱。

  • If you have any questions or want to share your thoughts about this topic, please leave a comment below.

    如果您有任何疑問或想分享您對這一話題的看法,請在下面留言。

For many years, data science has been called the sexiest job of the 21st century.

多年來,數據科學一直被稱為 21 世紀最性感的工作。

字幕與單字
由 AI 自動生成

單字即點即查 點擊單字可以查詢單字解釋