字幕列表 影片播放 列印英文字幕 [MUSIC PLAYING] LEZHIE LI: Hi, everyone. My name's Lezhi, and I'm a software engineer at Uber. So today I'm going to talk about how we build a visual debugging tool for machine learning using TensorFlow.js. So why is model debugging important? Machine learning practitioners report that they spend the majority of their time not on building a model, but instead, on iterating and debugging of the existing model. So there's huge opportunity for us to improve efficiency of the 80% of the time. Traditionally, the only guidance for model developers to evaluate a model performance is by looking at performance metrics. Although these metrics are useful, they do not give too much insight into how to improve on a model or why a model performs in a certain way. So given the intrinsic opacity of machine learning algorithms, it is very hard for anyone who wants to try to understand model performance. So how do we solve that problem. The idea here is that we can transform a model space problem into a data space problem. And by that we mean that, instead of asking what went wrong with the model, we look at on which data did this model make mistakes. And instead of asking why a model makes certain mistakes, we look into the future characteristics of these failed data points. So based on those two ideas, we developed Manifold, which is a model agnostic visual debugger for machine learning. Here's the workflow of using Manifold. The user will connect Manifold to the output data set of several machine learning outcomes, and Manifold will automatically segment these data sets into subsets, each subsets containing data points with similar performance with each other. The users would choose the subset of their interest to compare against each other, and Manifold would highlight the feature distribution difference of these two different subsets, and helping them to diagnose the behavior of the performance outcome. So while we developed these ideas into production, we faced several technology challenges. And among them there's a performance challenge and also portability challenge. So traditionally, it is the model training backend's job to handle the performance metric calculation. But that pattern is no more applicable to our visual interface because of the latency introduced by the recalculation in response to the user interaction. And also, if you want to connect Manifold to another machine learning training back end, there are two pieces of code we need to port out, the back end code and a front end code. But in reality, the metrics calculation logic actually belong to the visual tool and should not be injected into the training back end. Those two reasons shows why we put this computation logic inside of front end. And because this computation could get intensive as a data volume increases, that's why we use TensorFlow.js to help us increase the competition efficiency. So what are the intensive computation involved in this Manifold interface? In performance configuration view, we compute and perform scores for each data point on each model and use those metrics to run the K-means clustering to segment this data set into subsets. And in the future attribution view, for each feature we compute the distribution histograms of the two different subsets, and using those histograms to compute KL-divergence to rank that feature importance for model developers to inspect the model performance. And in all of those scenarios, TensorFlow.js gave us a lot of performance boosting compared to plain JavaScript implementation. And in some cases, the performance boosting can be as high as 100 times, for the per instance model metrics computation. So to conclude, complex tasks such as machine learning diagnosis can benefit a lot from numerical computation capacity of TensorFlow.js. And TensorFlow.js opens up new opportunities for developers of visual analytics source. OK, that's it. Thank you. [APPLAUSE] [MUSIC PLAYING]
B1 中級 構建ML的可視化調試工具--TF.js在交互式可視化分析中的應用(TF Dev Summit '19) (Building a Visual Debugging Tool for ML - TF.js in Interactive Visual Analytics (TF Dev Summit '19)) 1 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字