字幕列表 影片播放 列印英文字幕 >> [Narrator] Live from New York, it's The Cube covering the IBM Machine Learning Launch Event brought to you by IBM. Here are your hosts, Dave Vellante and Stu Miniman. >> Good morning everybody, welcome to the Waldorf Astoria. Stu Miniman and I are here in New York City, the Big Apple, for IBM's Machine Learning Event #IBMML. We're fresh off Spark Summit, Stu, where we had The Cube, this by the way is The Cube, the worldwide leader in live tech coverage. We were at Spark Summit last week, George Gilbert and I, watching the evolution of so-called big data. Let me frame, Stu, where we're at and bring you into the conversation. The early days of big data were all about offloading the data warehouse and reducing the cost of the data warehouse. I often joke that the ROI of big data is reduction on investment, right? There's these big, expensive data warehouses. It was quite successful in that regard. What then happened is we started to throw all this data into the data warehouse. People would joke it became a data swamp, and you had a lot of tooling to try to clean the data warehouse and a lot of transforming and loading and the ETL vendors started to participate there in a bigger way. Then you saw the extension of these data pipelines to try to more with that data. The Cloud guys have now entered in a big way. We're now entering the Cognitive Era, as IBM likes to refer to it. Others talk about AI and machine learning and deep learning, and that's really the big topic here today. What we can tell you, that the news goes out at 9:00am this morning, and it was well known that IBM's bringing machine learning to its mainframe, z mainframe. Two years ago, Stu, IBM announced the z13, which was really designed to bring analytic and transaction processing together on a single platform. Clearly IBM is extending the useful life of the mainframe by bringing things like Spark, certainly what it did with Linux and now machine learning into z. I want to talk about Cloud, the importance of Cloud, and how that has really taken over the world of big data. Virtually every customer you talk to now is doing work on the Cloud. It's interesting to see now IBM unlocking its transaction base, its mission-critical data, to this machine learning world. What are you seeing around Cloud and big data? >> We've been digging into this big data space since before it was called big data. One of the early things that really got me interested and exciting about it is, from the infrastructure standpoint, storage has always been one of its costs that we had to have, and the massive amounts of data, the digital explosion we talked about, is keeping all that information or managing all that information was a huge challenge. Big data was really that bit flip. How do we take all that information and make it an opportunity? How do we get new revenue streams? Dave, IBM has been at the center of this and looking at the higher-level pieces of not just storing data, but leveraging it. Obviously huge in analytics, lots of focus on everything from Hadoop and Spark and newer technologies, but digging in to how they can leverage up the stack, which is where IBM has done a lot of acquisitions in that space and leveraging that and wants to make sure that they have a strong position both in Cloud, which was renamed. The soft layer is now IBM Bluemix with a lot of services including a machine learning service that leverages the Watson technology and of course OnPrem they've got the z and the power solutions that you and I have covered for many years at the IBM Med show. >> Machine learning obviously heavily leverages models. We've seen in the early days of the data, the data scientists would build models and machine learning allows those models to be perfected over time. So there's this continuous process. We're familiar with the world of Batch and then some mini computer brought in the world of interactive, so we're familiar with those types of workloads. Now we're talking about a new emergent workload which is continuous. Continuous apps where you're streaming data in, what Spark is all about. The models that data scientists are building can constantly be improved. The key is automation, right? Being able to automate that whole process, and being able to collaborate between the data scientist, the data quality engineers, even the application developers that's something that IBM really tried to address in its last big announcement in this area of which was in October of last year the Watson data platform, what they called at the time the DataWorks. So really trying to bring together those different personas in a way that they can collaborate together and improve models on a continuous basis. The use cases that you often hear in big data and certainly initially in machine learning are things like fraud detection. Obviously ad serving has been a big data application for quite some time. In financial services, identifying good targets, identifying risk. What I'm seeing, Stu, is that the phase that we're in now of this so-called big data and analytics world, and now bringing in machine learning and deep learning, is to really improve on some of those use cases. For example, fraud's gotten much, much better. Ten years ago, let's say, it took many, many months, if you ever detected fraud. Now you get it in seconds, or sometimes minutes, but you also get a lot of false positives. Oops, sorry, the transaction didn't go through. Did you do this transaction? Yes, I did. Oh, sorry, you're going to have to redo it because it didn't go through. It's very frustrating for a lot of users. That will get better and better and better. We've all experienced retargeting from ads, and we know how crappy they are. That will continue to get better. The big question that people have and it goes back to Jeff Hammerbacher, the best minds of my generation are trying to get people to click on ads. When will we see big data really start to affect our lives in different ways like patient outcomes? We're going to hear some of that today from folks in health care and pharma. Again, these are the things that people are waiting for. The other piece is, of course, IT. What you're seeing, in terms of IT, in the whole data flow? >> Yes, a big question we have, Dave, is where's the data? And therefore, where does it make sense to be able to do that processing? In big data we talked about you've got masses amounts of data, can we move the processing to that data? With IT, the day before, your RCTO talked that there's going to be massive amounts of data at the edge and I don't have the time or the bandwidth or the need necessarily to pull that back to some kind of central repository. I want to be able to work on it there. Therefore there's going to be a lot of data worked at the edge. Peter Levine did a whole video talking about how, "Oh, Public Cloud is dead, it's all going to the edge." A little bit hyperbolic to the statement we understand that there's plenty use cases for both Public Cloud and for the edge. In fact we see Google big pushing machine learning TensorFlow, it's got one of those machine learning frameworks out there that we expect a lot of people to be working on. Amazon is putting effort into the MXNet framework, which is once again an open-source effort. One of the things I'm looking at the space, and I think IBM can provide some leadership here is to what frameworks are going to become popular across multiple scenarios? How many winners can there be for these frameworks? We already have multiple programming languages, multiple Clouds. How much of it is just API compatibility? How much of work there, and where are the repositories of data going to be, and where does it make sense to do that predictive analytics, that advanced processing? >> You bring up a good point. Last year, last October, at Big Data CIV, we had a special segment of data scientists with a data scientist panel. It was great. We had some rockstar data scientists on there like Dee Blanchfield and Joe Caserta, and a number of others. They echoed what you always hear when you talk to data scientists. "We spend 80% of our time messing with the data, "trying to clean the data, figuring out the data quality, "and precious little time on the models "and proving the models "and actually getting outcomes from those models." So things like Spark have simplified that whole process and unified a lot of the tooling around so-called big data. We're seeing Spark adoption increase. George Gilbert in our part one and part two last week in the big data forecast from Wikibon showed that we're still not on the steep part of the Se-curve, in terms of Spark adoption. Generically, we're talking about streaming as well included in that forecast, but it's forecasting that increasingly those applications are going to become more and more important. It brings you back to what IBM's trying to do is bring machine learning into this critical transaction data. Again, to me, it's an extension of the vision that they put forth two years ago, bringing analytic and transaction data together, actually processing within that Private Cloud complex, which is what essentially this mainframe is, it's the original Private Cloud, right? You were saying off-camera, it's the original converged infrastructure. It's the original Private Cloud. >> The mainframe's still here, lots of Linux on it. We've covered for many years, you want your cool Linux docker, containerized, machine learning stuff, I can do that on the Zn-series. >> You want Python and Spark and Re and Papa Java, and all the popular programming languages. It makes sense. It's not like a huge growth platform, it's kind of flat, down, up in the product cycle but it's alive and well and a lot of companies run their businesses obviously on the Zn. We're going to be unpacking that all day. Some of the questions we have is, what about Cloud? Where does it fit?