字幕列表 影片播放
Traditionally where people have used compute in their their research
They know what to do because they've they're experts in the field and a lot of our classical users using
HPC know exactly what they want they know how to process their data
They know they need to use scientific Linux
They can write software to process their data, and they can get their own research questions answered by themselves
and you know part of information services job is simply just to provide the hardware in the grunt that they need to do that but
Increasingly we're finding that other people are wanting to use compute in their research
But they're not really knowing how they haven't got those skills so part of my role is to sort of
link up their research problem with the skills needs the software and the analysis tools that they need to you know answer their questions I
Was a medicinal chemist so that's a type of synthetic chemist who designs and makes new drugs, so
All the different processes through you know designing a small molecule for a particular drug target
Doing some computational docking using the hpc to find out of the million possibilities
Which ten should I make and then test to see which might be the cure for cancer?
asthma
Alzheimer's disease or just the new painkiller I
Started off my training doing a PhD in chemistry just the simple. How do you make this compound what reagents to use?
How do you make it but as I sort of progressed in my career moved into the drug discovery space?
Where the chemistry's just applied as the technique? That's just one of the tools and the the interesting and the intelligent thing is
This is the drug target. What should I make to interfere with that target to have that biological effect?
And my interest really looked it moved into sort of the using computers to answer those kind of questions
So what do we make why do we make it? Where should we direct our efforts?
Traditionally in all style research it would just be somebody sat in an office thinking about a particular problem, and then proposing an answer
But then you know that as a scope and possibilities. They increase you know we've got more data available
We've got more possibilities available it really expands beyond what one person can hold in their head so as research is
interdisciplinary and we use chemists biologists engineers mathematicians
So that you know in a traditional submit' like wet organic chemistry we now need to use computers to help analyze
possibilities data and questions, so yeah kind of expanding that
Research space into using computers is really sort of becoming increasingly more important
how I got involved in HP pcs and computing was doing something called docking so if we have a
Small molecule of a drug molecule say like aspirin or salbutamol though, we think might be a good molecule in a particular drug target
What we can do is use the computer and ask it a question saying does this small drug molecule fit into
This receptor protein how good a fit is it?
Where does it fit in what shape is it when it fits so what we can do is use specialist software
packages to ask that question of
hundreds of thousands or even millions of small molecules so you will prepare a question
How well do these five million drug molecules fit into this receptor with the question with the software?
I'll submit that question to the HPC queue, and the that's where the HPC takes over and say okay, this guy's got this question
That's broken down into actually 25 million sub questions
And it's the HPC scheduler that then splits that job up into separate Nords and distributing out to different processor cores
So I could have 2,000 processor cores
One working on each drug molecule into that separate receptor when it's finished with that one
It will tell this and a master controller that it's done
and it will be allocated the next one to do so it's kind of like that the HPC is acting as my research assistant and
answering all those millions and millions of questions for me while I'm in the lab making - or
Having a cup of coffee or lunch
Or chatting to the boss about what the next question is that particular question sounds like a very very complicated
Puzzle really isn't it and it sounds like a really complicated puzzle
But that the techniques and the software tools for asking that question. How well does that molecule fit into there a very mature?
It's very well known. It's very well understood a subject the problem is really the scale and I
Can't have enough computing power to dock every single possible molecule into every single drug target
It's got to be an intelligent choice, but as computing power is ever increasing
You know Moore's law more processors more memory
I can ask more of the computer get it to tell me more information so I can concentrate on the the chemistry specific knowledge
I'm imagining
Researchers from across the university well certainly across the world wanting to to use computers to do these to answer these questions
Is there one science that uses it more than others or?
No, I don't think there is really I mean. I think traditionally users have come from physics astronomy chemistry
Engineering in particular, but also increasingly biology genomics researchers the life sciences
And we're now seeing people from the social sciences humanity and even arts coming in to start using
Computers in their research and are these high-performance computing systems
Or clusters are becoming the norm are people expecting these as part of their research now absolutely
Yes, you know again
Going back years what we need to provide to an academic
Would be an office a desk a green board and some chalk now people really expect and you know that high performance
Computing facilities are available to them and the university does that by providing HPC facilities like we're just looking at today
But also renting them from cloud providers like Amazon and Azure
Microsoft and Google and others as well thinking of those leaders of the field Google and Amazon cloud computing etc
Why would you do it yourself if all those options are available was a number of reasons we might want to do it ourselves
First and foremost those companies are really good at it
But most companies are really good at it because they charge for doing it
So there is a higher cost associated with renting somebody else's kit to do it
There are times when we need to do it on-site for security reasons
So if we're working on some very sensitive research material, maybe something in the medical field with patient data
We'll need to guarantee to the funders and who owns that data
It's very safe and secure by keeping it on-site
Equally we might be doing something say with genomics research where the quantity of data is so vast in you know
Terabytes of data per hour that we need to analyze it and process it here on site
And that the costs and the time associated with shipping that data somewhere else
to process it and answer those research questions
And then bring the data back again is is prohibitive equal at those times where we're using remote data. Say satellite imagery or
data from the Human Genome Project
Well that data already exists off in the cloud
So it makes far more sense there for our researchers to take the question to the data
And analyze that data off-site so kind of by offering both
We can hopefully give researchers the ability to choose this one's better for me
or this one's better for me does it ever go horribly wrong in that you ask a question and you've made a mistake and
Wasted hours and hours or days of HPC - absolutely, it's happened to me more times than I care to mention
One time in particular I was doing those docking small molecules as a drug mark
Targets only a couple of hundred thousand there was some sort of software error it started churning out an ever-increasing error file
size of the error file went through 300 gigabytes
Blocks the entire system everybody else's jobs failed 20 or so people quite angry with me that I just killed all of their research
But that kind of thing does happen in research
and I guess that's the that's the computational equivalent of the professor having an explosion in the lab and
Spraying stuff all over the room which I've also done
Maybe I'm in the physics lab working with Professor Moriarty
And I want to do some computing time how much it gonna cost me to use that kit
That's a good question at the minute
We don't actually charge our researchers directly for using an HPC facility
So there's no per hour
charge for using a
Computer core you do know this is going to be available for them to all watch and then they're all going to start
clamoring after this absolutely you know and
If more people need more resources and we can provide them we can we can look to work for them what we want our research
To be ambitious and to try and push the boundaries, and we can't do that by restricting
Unnecessary access to kit if money doesn't really come into it in that way how do you decide who does what and are there fights?
That erupt as to who needs to compute power more than who else there are vigorous discussions each time we get a new
HPC system and you know you you you can see in there it's a large
HPC there are so many processor cores and it can be used for such and such a period of time
and there's always a debate about
How much of it at any one time is somebody allowed to use and how long can any one person's job go on through so?
Very much like Tetris how can I fit these variable width and size blocks in?
Is it better for me to use a thousand calls for 10 hours or 100 calls for many thousands of hours?
And how do I fit that workload it always causes vigorous discussion few disagreements?
There's always morning that the cues far too long, but that's just how it is
Where does the book stop in terms of the decision-making is it left at a software world as a human come in and say?
Every time we sort of review the process a group of humans will sit sit down and decide okay
we think it's fair if
Everybody's allowed to run up to a thousand cause for up to four days time and we kind of sit down and make those decisions
As a research community, and then the software vigorously enforces those limits for us
so you are only allowed up to so many hundreds or thousands and your job is only allowed to run for a
Maximum of four days and if it goes over four days it stopped and the next person's given access to that
Resource in a way that could be limiting research, right?
It's something that the researchers have got to trying to fit into their research plans such as in the same way that
Office space and laboratory space can be a limiting factor
It's how do I divide my research question up onto the computer am I better?
using two thousand calls for a short period of time or is it the kind of question that is
Only limited to a thousand cores and needs to run for a longer period of time
Do you need to know a fair bit about computing just to make those decisions, though?
traditionally yes, I think there has been an expectation you have to know a lot about how computers work to make those decisions, but
We are seeing now as the use of computers in research moves into other areas
part of my role is to try and help people to
Understand how computers think as computers work in a very different way to a researcher
they know good at answering the same question for a long period of time in parallel so sort of
Changing how you would do it in
In person how you would add do some
calculations to how a computer would do it lots of different things at the same time it is a
Is a step change for some people?
The equipment itself is fairly generic. You know that these are standard
Blade enclosures the storage is standard storage we have about 240 terabytes in this
It's all connected up by a minivan