字幕列表 影片播放
[ Noise ]
[ Silence ]
>> Welcome and thanks for coming this afternoon.
I'm Dan Rockmore, Chair of the Department of Mathematics here
at Dartmouth and also Director
of the William H. Neukom Institute
for Computational Science.
On behalf of the college, the institute, and the friends
of Dartmouth library, it's my pleasure to be able
to introduce Professor Robert Darnton today
of Harvard University who will be speaking to us
on the Digital Public Library of America and the Digital Future.
This is the third lecture on our leading voices
in higher education series and as moreover,
the Inaugural Donoho Colloquium.
The Donoho Colloquia will be an ongoing series
of public lectures aimed in increasing awareness
of the many important and sometimes surprising places
in which computational ideas appear.
This is a central piece of the larger mission
of the Neukom Institute whose aim is to support
and integrate computational thinking
and computational ideas throughout Dartmouth.
These lectures are made possible by a generous gift from David,
Miriam, and Dan Donoho in honor of Dan's graduation as a member
of the class of 2006 and where Dan's brainchild in fact
to honor that graduation.
Dan is at present in the Emergency Room working
through his anesthesiology rotation,
don't be worried it's not in the Emergency Room.
But we are fortunate to have David and Miriam
who moved heaven and earth to get here and--
so thank you very much for coming in for your gift.
[ Applause ]
When we looked to initiate the Donoho Colloquium,
I immediately thought of Robert Darnton as the first lecturer.
He's a leading authority on the French enlightenment
in the history of the book but it was his many cogently argued
and beautifully written New York Review essays on the creation
of a Digital Public Library of America
that made him a natural choice as the first Donoho lecturer.
His arguments mix historical anecdote with close legal
and moral reasoning and are masterful displays of passion
and advocacy and careful analysis.
He makes clear the challenges and possibilities inherent
in such an endeavor as well
as the central role the computational
and digital technology play in the story.
Now, the idea of such a public resource also has an important
Dartmouth connection as one of the first public calls
for a national computer-based library can be found
in a lecture by former math professor and 13th president
of Dartmouth, John Kemeny.
Given just over 50 years ago at a conference convened
to mark the 100th anniversary of the founding of MIT.
In his lecture, a library for 2000 A.D., Kemeny advocated
for a national research library.
A central resource for the nation's research community.
Kemeny argued that the sheer projected volume
of textual resources and the attendant problems
of information search
and retrieval would require digitized storage and access.
Now, I can't help myself from showing you a table
from Kemeny's talk that he used to illustrate the kinds
of problems he anticipated.
So this is what he viewed
as the big problem in search [laughter].
So, what you saw is 2 hours and 27 minutes and 45 seconds
to find a book so the walk to the library, finding the card
and the catalogue, up the stairs,
discovering the book is missing [laughter] would--
the majority of time spent waiting for Professor S
to return from lunch [laughter] but I have to say,
I love this for so many different reasons
but I also know it's true
because Professor S was my dear friend, Laurie Snell.
So I know that is [laughter] this is a true story.
For those of you who know Laurie,
this is perfectly believable.
So Kemeny's lecture is simultaneously present
and of it's time as he gives us a detailed vision
for an electronic library but one deeply rooted
and tape drives and phone connections.
Great advances in technology as well as computer science,
mathematics, and statistics have made possible the much more
ambitious goal that is a Digital Public Library of America.
From conception to execution,
Professor Darnton has led the charge for its creation.
Robert Darnton is Carl H. Pforzheimer University Professor
at Harvard and Director of the Harvard University Library,
he is a Harvard graduate, a Rhodes Scholar,
a former reporter for the New York Times, and was a professor
on the Princeton History Faculty from 1968 until 2007
when he returned to Harvard.
He has held numerous visiting positions and is a member
of the boards of many prestigious institutions
including the New York Public Library.
He is the author of many scholarly essays
and books including the Forbidden Bestsellers
of Pre-Revolutionary France
which was a National Book Critics Circle Award winner.
Professor Darnton is the recipient of numerous honors
and prizes including a MacArthur Fellowship and most recently,
a National Humanities Medal received just a few weeks ago
from President Obama.
In the words of the citation, Professor Darnton has a quote,
"Determination to make knowledge accessible to everyone."
As an author, he has illuminated the world of Enlightenment
and Revolutionary France, and as a librarian, he has endeavored
to make his vision for a comprehensive national library
of digitized books a reality, end quote.
We look forward to his sharing of that vision
with us this afternoon so please join me
in welcoming our first Donoho lecturer,
Professor Robert Darnton.
[ Applause ]
>> Thank you Dan.
Well, thank you I'm delighted to be here.
It's good to see snow, it's the first snow I've seen this
winter practically.
But I'm especially honored to be giving the first
of the Donoho lectures and I'm delighted
that you could come yourselves.
I think that Neukom Institute is a good thing
and I think probably, a lot of you care about books
so that makes me feel good.
They can be digitized, they can be printed on paper
but they are actually doing rather well,
even the old fashioned printed codex.
Believe it or not, this year,
more books will be published worldwide than ever before.
1 million new titles almost all of them in print.
It's amazing.
So when people tell you the book is dead, just shake your head
in disbelief, the book is not dead.
It makes me think often of one of my favorite graffiti
and it's actually in the men's room of Firestone Library
in Princeton, you know, and you may have seen one like this.
It begins, "God is dead," signed Nietzsche [laughter]
and then underneath Nietzsche is, "Dead,"
signed God [laughter].
The book is absolutely not dead.
And I think there are a lot of misconceptions actually
about the digital and the analog as if they were at war
with one another, you know, as if they occupied opposite
and inimical positions on some kind of technological spectrum.
One thing we've learn from the history of books is
that one medium does not displace another.
Believe it or not, after the invention or reinvention
of movable type by Gutenberg, manuscript publishing increased
and it continued to thrive for 3 centuries after Gutenberg.
It was often cheaper to hire scribes to copy out a whole book
for an addition of less than 100 copies.
So people are publishing manuscript books well
into 18th centuries, some even in the 19th century.
And I think today, we all understand
that the radio did not kill the newspaper,
and TV didn't kill the radio, and the internet didn't kill TV,
we live in a, I think, an environment of media
that gets richer and more complicated
but it's certainly not one in which it's just zero-sum games
and the printed book is gone.
That does not mean, however, that all is well in the world
of printed books, I mean there are a lot
of very unhappy publishers, authors, booksellers,
and even at the occasional librarian, I think, Jeff.
There is pressure all over the place
and that's really the subject of my talk.
So I'd like to begin if I may by quoting Thomas Jefferson,
the devil can quote Thomas Jefferson but I like to do
so anyhow and I've done this in other settings
because of his famous remark in a letter that he wrote
in 1813 developing a metaphor about light.
So you should think of the enlightenment, light in the form
of a candle which he'd call the taper.
So I'll give you the full quote
and I hope we will all feel enlightened and then I will try
to take it from there.
"If nature has made any one thing less susceptible
than all others of exclusive property, it is the action
of the thinking power called an idea,
which an individual may exclusively possess as long
as he keeps it to himself; but the moment it is divulged,
it forces itself into the possession of every one,
and the receiver cannot dispossess himself of it.
Its peculiar character, too, is that no one possesses the less,
because every other possesses the whole of it.
He who receives an idea from me, receives instruction himself
without lessening mine; as he who lights his taper at mine,
receives light without darkening me."
Now, you might think that the 18th century ideal
of spreading light, enlightenment, sounds archaic,
in fact it may sound suspiciously professorial.
We professors like to invoke Thomas Jefferson
and I especially like to invoke people like Condor [inaudible]
who was all for spreading light and who is convinced
that there would be indefinite progress,
thanks to the publication of books.
But that can sound naive and the point
of course is what then could have been
or actually was merely Utopian is now possible thanks
to modern technology, the internet.
Still, having said that and probably a lot
of people would agree on the face of it,
it can nonetheless sound Utopian so I would
like to invoke another kind of American sprit
that can do pragmatic, no-nonsense, business plan type
of spirit in order to argue my case.
The point is that you can invoke even economists
to develop this sort of an argument.
After all, one of the most hard-boiled concepts
of modern economics is that of a public good.
Public goods such as clean air, efficient roads,
hygienic sewage disposal, and adequate schooling,
benefit the entire citizenry and when citizens benefit,
does not diminish that of another.
Public goods are not assets in a zero-sum game.
But they do carry costs, upfront costs usually paid
for by taxation and this occurs at the production end
of services and facilities that the public enjoys.
So the Jeffersonian ideal of access to knowledge
as a public good does not mean that knowledge is costless.
We enjoy freedom of information, but of course,
information is not free.
Someone had to pay for Jefferson's taper.
Now, I would like to emphasize that point
because few people have any idea of what it actually costs
to provide them with the information
that they consult every day on the internet.
Instead, they complain about information overload.
My daughter, for example, laments the fact that,
as she puts it, "The amount
of medical knowledge doubles every 2 years."
And yet she knows nothing about another tendency that undercuts
that doubling, namely commercialization.
According to several reliable sources,
the amount of research published
in medical journals actually does almost double
over 2-year periods.
The US library of medicine reports that the number
of medical journals increased from 3,472 in the year 2000
to 4,866 in the year 2010.
And the number-- excuse--
the statistics, but can you imagine having
to read this many journals if you're a doctor
and you've got a patient with some distressing symptoms
that you can't quite figure out.
Well, you'd go to the internet of course and you have
to find the right article.
Well, citations to the articles in these journals increased
from 10.7 million in 2000 to 18.3 million in 2010.
How could anyone find all pertinent information even
with a powerful search engine in this ocean of publications?
I don't know, but of course, doctors keep trying.
There was an average of 3.5 million searches a day in 2009
in just in medical journals.
What the doctors fail to understand is
that their searches take place in fenced-off territory
which belongs to the publishers of the medical journals.
The publishers charge exorbitant prices
for access to their terrain.
And their enclosure movement increases while
cyberspace expands.
So, yes, more knowledge is being constantly produced
and an increasingly small percentage
of it is accessible to the public.
Now I'd like to discuss this tendency in relation to the cost
of journals and books and then
to suggest how it could be inversed by treating knowledge
as a public good provided through the internet.
If I could come back to the example of my doctor,
I should explain that he works in a teaching hospital attached
to the Harvard Medical School,
which means that the Harvard library gets to pay
for all of these journals.
Through his computer and his smart phone, he has access
to all of the journals that the medical school buys for him.
And that is almost entirely 99.9 percent in the form
of electronic journals whose total cost for Harvard just
for medical journals is 2.5 million dollars a year.
The journals include the Journal of Comparative Neurology;
this priced 29 thousand dollars for a year's subscription.
Brain Research, 20 thousand-- 23 thousand dollars a year;
Biochemica, 20 thousand dollars a year;
and I could go on and on.
The cost of academic journals in general has increased
at 4 times the rate of inflation since 1980.
Everything indicates that it will continue to increase
at along the same trajectory,
maybe it'll level off a little bit but not much.
The prices for the increase in journals in general,
scholarly journals, is estimated to vary the increase
between 4 percent and 9 percent.
And this is after catastrophic ratcheting
up of the cost of journals.
So health maybe a public good but information
about health is monopolized by publishers who extract
as much profit as the market will bear.
Now this is not news to librarians
and you'll find librarian can tell you a lots more about it.
They have had to make room in their budgets
for the hyper inflation of journal prices year after year
for at least 3 decades.
But this news is not understood by many academics.
They actually perpetrate a kind of irrationality at the heart
of the system because of course we academics do the research.
We write the articles, we serve as referees
for articles written by others.
We also serve on the editorial boards
of the journals often as editors.
And then we buy back the result of our labor which is all done
for free at outrageous prices.
But of course, we don't pay for it, our library does.
And very few academics understand how this can dent a
library's budget.
You know, there used to be a rule of thumb
that libraries would spend roughly 50 percent
of their acquisition's budgets on periodicals
and 50 percent on monographs.
Well, those percentages have changed
and now many libraries spend 60, 75 percent some 90 percent
of their acquisition's budgets just on serials.
So that is-- means that they're not buying monographs anymore.
And if they don't buy monographs, think of what effect
that has on university presses in subjects
like the social sciences and humanities.
They have to cut back on it because they depend
to a considerable extent on sales to libraries.
And if they cut back on the production of monographs,
what's going to happen to these new PhD students
who must publish or perish?
There's a kind of vicious circle
at work throughout this whole system
and the system just looked at, in those terms,
seem to me, extremely irrational.
Well the publishers would have an answer to this.
They would say that, first of all, there's a kind
of naive idealism behind the Jeffersonian Principle.
We live in a real world of, well capitalism.
And it's true that not only did Jefferson discount the cost
of his taper, he had a-- not a very successful business plan
when it came to trying to run Monticello.
You may know that he really went bankrupt and there had
to be a collection to keep him from going broke.
Of course, it can be expensive to publish a journal.
I'm not denying that at all.
And look at what a good journal does.
There are referees to be organized.
It can be a big job.
There are-- there's editing to be done,
there are pages to be designed.
The journal has to be marketed; the money has
to be collected and redistributed.
There is a lot of what publisher's call "Added value."
And I'm not trying to minimize that in the slightest.
So yes, journal publishers deserve a fair return
on their investment, but what is fair.
Last year, Elsevier's profit margin was 36 percent
on an income of 2 billion pounds.
Other publishers often report profits of 20 to 40 percent.
In its analysis of their practices,
Deutsche Bank concluded, "If the process really were as complex,
costly, and value-added as the publishers protest that it is,
40 percent margins would not be available."
Now, publishers could answer
by invoking the famous market place of ideas.
They could return the Jeffersonian argument
against itself by asserting that in a free market of ideas,
the best will triumph.
Whether embodied in articles or books or any other format,
the best will sell and sell at a fair price determined by demand.
Unfortunately however, demand is not flexible in the world
of scholarly periodicals.
Publishers create journals
in certain highly specialized sectors
where they can have the territory all to themselves.
Once they staked out their turf, hired a prestigious board
of editors because prestige is crucial in this game,
and begun to accumulate a following among readers,
they can keep competitors out.
In fact, competition rarely exists
in the esoteric sectors of science.
And the big 3 publishers, Elsevier, Wiley-Blackwell,
and Springer, published 42 percent of all journal articles.
They group journals in bundles selling the newer
and more obscure publications along with the more famous ones
and if you, the librarian, want to unbundle the bundle,
then somehow mysteriously as you read out journals you don't want
so much, the price of the ones you do want increases
so that it's more expensive than it ever was in bundled form.
They have a hundred tricks to keep
up that 40 percent profit margin.
Well, I won't go on and on, but I do think that we've got
to do something about this.
And for one thing, we should be able to share information
but may of these contracts have nondisclosure clauses.
So that I can't know what Jeff pays for his bundle,
except that it's probably too much.
[Laughter ]
Well, the market is being manipulated and monopolized
and I think that private gain is eclipsing--
has eclipsed the public good.
Jefferson's taper has been reduced to an ashen glimmer.
How long can the price gauging continue?
Well, we may be nearing a breaking point
because some research libraries have simply found it impossible
to pay for the continuous increase in the journal prices.
They refuse to renew subscriptions
and write-out complaints from their faculty members
who demand, of course, an unlimited supply of knowledge.
And sometimes, they, for example, provide pay-per-view,
that is to say a faculty member or a student can pay just
to read a particular article that may have been recommended.
Now the cost for Wiley-Blackwell
to read one article is now 42 dollars, to read one article.
Few libraries have summoned up the courage to walk away
from the table in contract negotiations when faced
with unbearably expensive terms.
Well, you might think, just tell them, "I'm the customer, the--
isn't the customer always right?
I won't accept that increase of mind percent
in this year of 2012."
It doesn't work like that because if I did
that in Harvard, there would be a revolt on the part
of the faculty beginning perhaps in the medical school
where my doctor ex-- he thinks he's just going
to have an endless flow of access.
So the alternative to this is not, I think,
simply to negotiate harder but to develop another strategy
that would reverse the economics of journal publishing.
I think we should treat it as a public good
in a manner analogous to the funding of the public roads.
This could be paid for at the production end
and made available free to users.
Although the US government already subsidizes a great deal
of research and also publishing through the NIH and the NSF,
it probably can't do much more.
I don't think we expect more money to be coming
out of congress for this sort of thing.
But as you know, the NIH has a huge budget.
And in 2000-- I'm forgetting the year, I think it was 2008,
the NIH have passed a requirement
that any research based on NIH funds, that is public funds paid
for by the tax payer, had to be made available
to the public, to the tax payers.
That was a mandate and it makes a certain amount of sense.
Don't you think that public-supported research ought
to be available to the public?
But, there-- a bill was introduced to the House
of Representatives in December to withdraw this mandate,
this so called Research Works Act
which is just going to wipe it out.
And who is behind this bill?
The lobbies.
I mean the lobbies are the ones that have been--
I would say, manipulating copyright, among other things,
for the advantage of private gain while neglecting the
public good.
So we are in a very difficult situation and I think
that we have to begin to work out something
that would work better and for the public good.
Now, things are changing fast, no need to tell you this.
We're going through a fascinating transitional period
from a world that was entirely analogue to a world
that will someday be overwhelmingly digital.
But now, you know, things are being mixed-up together
in fascinating ways that I find enriching in general
but also very expensive.
Clear distinctions no longer exist between text and data,
articles and books, searching and researching, posting
and publishing, authorship and readership, writing
and mixing and mashing.
The blurring of boundaries and the untethering
of knowledge may make us feel uncomfortable but they belong
to a transformation of the landscape of information
that I think will create new room for the public good.
To illustrate this point, I would like to devote the rest
of my talk to one of these possibilities,
the attempt to build a digital library
that will make the cultural heritage
of the United States available to all Americans and, in fact,
to everyone in the world.
Now, although fantasies about a mega, meta,
macro library go back to the ancients, the possibility
of actually constructing one is recent.
It dates from the creation of the internet,
1974, and the web, 1991.
Google demonstrated that the new technology could be harnessed
to create a new kind of library.
One that, at least in principle, could contain all
of the information and all of the books in the world,
but Google Book Search is a story of a good idea gone bad.
As first conceived, it promised to do what Google did best,
that is it was going to be a search service
so that you could request information
and Google would provide on the screen
of your computer the word search surrounded by snippets.
So you would get a few sentences
that would tell you how this word figured in a book
and often, Google, even better, provided information
about the nearest library where you could get that book.
I mean, I think that was terrific
but that's not what happened, why?
Well because Google digitized books not only--
that were not only in the public domain, they crossed
over the boundary that separated public domain books
from books covered by copyright.
They came first to Harvard where we told them,
"Public domain books, yes; copyrighted books, no."
But they also came to Michigan and Stanford and the University
of California which did permit them
to digitize copyrighted books.
So instantly, they were sued for infringement of copyright
by the Authors Guild and the Association
of American Publishers.
And as soon as they suit was placed,
secret negotiations began.
These negotiations lasted almost for 3 years, and at the end,
there was an announcement
of something called "The Settlement."
Now the settlement was just the--
almost the opposite extreme
from the original Google search service.
It was the creation of a well, gigantic digital library
in which the libraries that had provided the books for Google
to digitize would be permitted to buy back digital copies
of those very same books at a prize to be determined by Google
without any public oversight or any limits.
And when I myself read the settlement
as it was being negotiated and finally announced, it seem to me
that the prize of access to this library could expand
out of hand just the way the prize
for periodicals had gone up.
So it-- I thought, for one, that this was not a good idea.
But I wasn't the only one who thought this
because of course it had to be submitted to a court.
And it was, as required, submitted to a court
and the Southern District of the-- of New York Federal Court.
It was a very interesting moment,
if I may open a parenthesis, the judge in this case was a--
is a man called Denny Chin.
And his story is a real American success story in many ways.
He arrived at age 5 with parents from China.
His father worked in a Chinese restaurant,
his mother swabbed floors;
they lived in the Hell's Kitchen district of New York City.
And we worked hard as a young boy.
Won a scholarship to Princeton, went to law school,
practiced for a while, became a judge,
and now he found himself the judge to decide a case which is,
I think, a monumental importance for whole future of books.
And you could say that Sergey Brin and Larry Page
of Google also represent things spectacular American's
success story.
You know developing Google in a garage and all
of that sort of thing.
They, too, were scholarships students with bright ideas.
So the two are confronted in this fascinating court case
which finally was announced, I mean,
the decision was announced in-- last March.
What judge Chin said was, "The Google Book settlement
that is a monopoly in violation of the Anti Sherman Trust Act."
And he based his argument on memos that were furnished
by the Department of Justice.
Very persuasive memos, I think I've read them all,
and even memos furnished by the Federal Republic of Germany
and the French Republic.
Not to mention, more that 400 people who were have, you know,
sending Amicus priest to the court saying,
"This is a bad idea," because it really came
down to dividing a pie.
Google would get 37 percent of the profits
and the litigants would get 63 percent, the public?
The public had no place in it whatsoever.
So Google Book Search was declared illegal.
And furthermore, it was a class action suit
so that judge Chin had to certify
that the Author's Guild really represented authors in general.
And he said, "No!
They don't nor does the Association
of American Publishers represent all publishers."
We could talk about class action suits if you like
but it's a fascinating example of trying to stretch one aspect
of American Law to cover something entirely new
in this new digital world, and it didn't work.
So my point is simply that Google chose the path
of commercialization when confronted with this conflict
about infringement of rights.
Whatever the faith of Google Book Search might be,
I think we must now take up where Google left off.
And in fact, we've been doing this long before judge Chin made
his decision.
In October of 2010, I called together a group of leaders
of foundations of libraries, computer scientists,
mathematicians, for an informal conference about the possibility
of creating an open access digital library.
And I sent just of page and half of general description.
The group came together
and almost immediately said, "We can do it.
We can do it technologically and we can do it financially."
So the heads of all the major foundations
in this country have said, "We will support this idea."
So the funding is there without going to congress.
I mean the hopes of getting anything
out of congress now are not great.
So we organized a steering committee, a secretariat
with a small grant from the Sloan Foundation
to cover administrative costs and other small costs.
Then we created 6 working groups which took up aspects
of very complicated program
to make this library actually happen.
And these working groups spread out throughout the country.
Lots of people were recruited from many different sectors
and they're hard at work at it.
Now, dealing with 5 basic problems which are the scope
and content of this DPLA, Digital Public Library
of America, its possible costs, the legal problems it will face,
its technical architecture, and its governance.
So, I don't have too much time left but I'd
like to discuss each of these 5
and then open the floor for questions.
Scope and content, the DPLA will not draw
on one gigantic database unlike Google.
It will be a distributed system which will aggregate collections
from many research libraries, museum, and other institutions.
It will provide one quick access to documents
in many formats including images, recordings, and videos.
But at first, it will concentrate
and consist primarily of books, books in the public domain.
Google digitized about 2 million books in the public domain
and copies of its digital files have been deposited
in this great repository known as the HathiTrust that some
of you might know about.
The Internet Archive which is a not-for-profit,
open access digitizing operation founded
by Brewster Kahle also has accumulated well
over a million digitized copies of public domain books.
So this exists already and what we want to do is
to bring it all together and make it available to everyone.
This material is largely already accessible online
so you might say, "Well okay, sounds great but what's
so wonderful about making it accessible all over again?"
And the answer is, this is just the beginning,
this will be the preliminary version of things
and it will include lots of material
that is undreamt of really by Google.
By that, I mean special collections.
Every great research library, such as yours here
at Dartmouth has fabulous special collections
and you've often digitized quite of bit of them.
At Harvard, we have something called the
"Open Collections Program"
in which we have digitize 2.3 million pages
of documents related to certain specific themes such as "Woman
at Work" and "Immigration" and "Voyages
of Scientific Discovery."
They're available on a repository we created free
of charge to everyone in the world.
The People's Republic of China came to us and asked
to digitize 51,500 of our rare Chinese books
because they're are not available in China
and so we've worked out an agreement
and they will be made available on the digital public library.
The-- another example but there are many,
many examples, concerns newspapers.
Every state has digitized all of the newspapers
in all of its collections.
They've been aggregated at the state level
and these 50 aggregated collections are
in turn being aggregated by the library of congress.
It's going to deposit all of them
in the digital public library of America.
So, already for starters, I think,
we will offer a fabulous treasure trove of information
to the American public.
Unfortunately, copyright laws prevent the public domain
from extending beyond 1923.
That means that most 20th century literature exists
in what librarians call a black hole.
It's covered by copyright and cannot be digitized
and made available without infringement of copyright.
So, what will our scope be and where will we draw the line?
Assuming we could get around the copyright laws, I'll discuss
that in a minute, some of us argue
that the DPLA should cons--
have everything right up to the present.
My own argument, but it's just mine, is that no, it should stay
out of the current market place for books.
And that we should have what you could call a moving wall
so that anything published during the last 5 years
or maybe the last 10 years would not be available
and we would not therefore threaten the interest
of publishers and authors who are understandably trying
to make money from the publication of books.
How long is the shelf life of a book?
I don't have an answer to that question but first of all,
most books never make it onto the shelves of bookstores.
Bookstores are going
out of business do a considerable extent.
But if the books did make it onto the shelves of a bookstore,
the bookstore existed.
How long would it be there?
Few days? Few weeks?
Then they disappear, remaindered or, you know, sent back,
returns, it's the plague of publishing.
So, I think actually that it would be in the interest
of many authors who once the economic of demand
for their book have disappeared to make those books available
for maybe a small free or indeed, free of charge.
Authors want readers and I'm sure many
of you here are authors and--
okay, academics don't hit the jackpot very often
but you might make of nothing royalties to take your husband
or wife out for supper once a year [laughter].
Anyhow, that's my case generally.
So, I think that we could, if possible,
have a fabulous library
that could include virtually everything
but not invade the commercial market.
Second point concerns costs.
Now, as I said, the DPLA will almost certainly be a
distributed system which will aggregate collections
that already exist in dozens of research libraries.
When it opens, it will probably contain only these basic stock
which I've just describe, but from that point onward,
it will grow as fast as its budget permits.
So what should its budget be?
Well, of course a lot of money will go
into the technological infrastructure
and then the administration,
although we hope it will not be heavily administered,
we don't want a lot of management in it.
But we can take the example of Europeana, I don't know how many
of you know Europeana.
It's an aggregator of collections in Europe.
So it's actually located in the Netherlands
and it aggregates already aggregated collections
in 27 European countries and it's not yet gone online.
It tried it once a few years ago and crashed
because there was so much demand.
But it will be going online again soon
and we are coordinating the design of our DPLA
so that it will be interoperable with that of Europeana.
In other words, we're working towards a worldwide system
of distribution.
Europeana's budget is only 5 million euros a year,
a very modest budget.
But of course, it doesn't digitize itself,
it doesn't under take preservation,
it doesn't do a lot of things that we want
to do for-- at the DPLA.
What would it cost if the DPLA led a major effort
to digitize books that are covered by copyright but are
out of print or commercially unavailable as Google calls it.
Well Brewster Kahle who's digitized more
than a million books for his internet archive says,
"I can digitize a book for 10 cents a page,"
and if you take a book of about 300 pages,
that comes to 30 dollars, not really very expensive.
Others think that's not really realistic although Brewster has
a lot of experience of digitizing.
They say, "Well, a dollar of page is more I like it,"
there's a big debate as to what the costs are but they're going
down all the time thanks to technological improvements.
So, it's true that we can-- we must not only digitize
but we have other functions to fulfill such as, well,
perfecting metadata, that is descriptions
of how you can locate the book.
We must do something about preservation.
It's fine to digitize but you have a responsibility
to preserve the book and we estimate
that preservation will be something like 20 percent
of the digital or digitizing costs.
And there are other possible services such as curation
and the development of apps of all sorts.
In fact, we will have a pilot project that we call a
"Scanabago" something like a Winnebago that will go
out to small towns in Massachusetts as a pilot
and just offer to scan tiny little special collections
in public libraries and then to help
that library develop its own collections too.
So, we see quite an important grass roots element
to all of these.
So, by combining ballpark or if you like,
back of the envelope estimates, I would think
that we could digitize a million books a year or an annual budget
of 75 to 100 million dollars.
The budget of the Library of Congress, by the way,
in 2010 came to 684 million dollars.
So, if a grand coalition of foundation contributed,
say 100 million a year, a great library would exist
within a decade.
Double that rate and the library would soon be the greatest
that ever existed.
But we don't need to rush, we must do the job right
and unfortunately, Google and much
of its digitizing didn't do the job right.
You've probably seen books in which a hand appears covering
up the page because the scanner forgot
to remove his or her hand.
And, then there's a metadata of Google which is famous because,
you know, they don't talk about books, they just talk
about information or data points.
And, so they cata-- they catalogued Walt Whitman's Leaves
of Grass under "Gardening."
So, we can do better than that and we are trying
to design a library that will last for centuries.
But it could grow gradually on a budget
of let's say only 10 million dollars a year.
Third point has to do with legal issues.
Dan, am I going over the time?
I should-- I can hurry up--
>> No, actually it's fine.
[Simultaneous Talking]
>> Am I-- okay, so I don't want to keep you too long
and you may have questions, I'm almost finished.
But the legal issues, I really see this
as the most important problem of all.
Of course, the DPLA must and will respect copyright.
How far can it go in making accessible books that are
out of print but covered by copyright?
Well, that depends on the possibility
of modifying the copyright laws by legislation
or perhaps on other strategies.
Now, the history of copyright in the United States goes back
to article 1 section 8 clause 8 of The Constitution
which sets 2 objectives, I quote, " To promote a progress
of science and useful arts, for securing for limited times
to authors and inventors the exclusive right
to their respective writings and discoveries."
The first copyright law passed in 1790 struck a balance,
I think, between those 2 objectives, how?
By giving authors the exclusive right to the income
from their books for 14 years renewable once.
And that provision in 1790 actually was--
took up the model provided by Britain.
In that first copyright act in existence, the Statute of Anne
in 1710, exactly the same objectives are announced,
and a balance was struck between the welfare of the public
on the one hand and that of the booksellers and authors
on the other and this deal was the same,
14 years renewable once.
But the Company of Stationers, the booksellers,
publishers protested and there were series of trials
that go right through the 18th century.
They're really quite fascinating involving people
like Alexander Pope, you know, great figures
in English Literature, and they were finally decided
in a famous case of 1774 for Donaldson versus Becket
by the House of Lords, actually, 14 years renewable once.
So that's where we got our model and, you know, it wasn't so bad.
The basic point was no perpetual copyright even though the best
lawyers in England had argued for it.
Copyright should not be perpetual.
Now, in the debate over the re--
so called extension of the Copyright Act of 1998,
in the American congress, the key actor was Jack Valenti,
the lobbyist for Hollywood, basically.
And Valenti was asked, "Mr. Valenti, do you believe
in perpetual copyright?"
And he said, "No.
Certainly not, I think copyright should be forever minus
one day."
[Laughter] So that's what we're up against
and you could say the Jefferson's taper has almost
died out.
The current limit of copyright,
the life of the author plus 70 years or 95 years,
in the case of corporate creations like Mickey Mouse,
it's known as the said Mickey Mouse Copyright Renewal act
of 1998.
This is in practice more than a century for every book.
And so we're keeping the vast bulk of our literature
out of the public domain where I think the bulk of it belongs.
So what can we do about this?
Well, it's a long and complicated story, you could say
that further legislation would solve the problem.
However, lobbyist have had such a heavy hand in attempts
to pass legislation especially about orphan books,
books whose copyright owners can't be identified, that--
it's a rather discouraging story.
There attempts in 2006 and 2008 to pass orphan book legislation
and people I know who followed this closely said the lobbyist
massacre, especially the 2008 bill which was never passed
so badly that it would have been worse
than having no bill at all.
So it's difficult to summon up much confidence about help
from congress, above a lot of things, not just copyright.
[Laughter] What about fair use?
Well, in the Copyright Act of 1976, there's a thing
as sector 107 and 108 which have been gone over endlessly
by lawyers and others because that's were the provision is
made for fair use.
And you use that of course today in your library
when you allow copyrighted articles to be made available
in classes for example.
Can we expand this Fair Use Act in such a way that it would hold
up in court for the public and not
for profit institution devoted to the public good?
I think that would be wonderful if we could do it
but my lawyer friends say very dicy.
And furthermore, if we-- once we get the DPLA up and running,
would we want to take the risk of so many suits especially
when damages begin at 100,000 dollars?
So I think we probably won't follow that path.
What else could we do?
Well, there are other things and I won't go into this
in too much detail 'cause I'm taking too long,
but there is a fascinating provision
that is working very nicely
in Scandinavia called Extended Collective Licensing Agreements.
And if you want, we could talk a little bit more about that.
But let me come to some of the other last 2 points,
first the technical architecture, I mean,
I was delighted to meet some
of your young computer scientists here,
a very impressive group.
And we are working very closely with computer scientists
for the technological infrastructure of the DPLA.
In fact in June, we announced what we called a "Beta Sprint"
and invited computers--
or anyone, anywhere to submit suggestions,
maybe an overall blueprint for the technological design
of the library or particular apps or aspects of it.
60 people or groups responded instantly,
and finally 40 competed.
There was lot of enthusiasm in the world of computer science
for this kind of a project and they were given 3 months
in this so-called "Beta Sprint" to come
up with a finish suggestion.
A blue ribbon jury passed judgment
on which ones was-- ones were the best.
And last October, we held a large meeting
in Washington hosted by the Library of Congress,
the Smithsonian Institution, the NAH,
and we announced the winners.
There were actually 6 winners
and we are incorporating their ideas in the first prototype
which we will have developed in 2 months
and then it will be submitted for further critiques
and finally, it would be ready when the DPLA gets up
and running in April 2013, April 2013, that's tomorrow.
The race to this deadline may seem breathtaking
but it's fueled by enthusiasm and energy.
Leading figures in Computer Science Information Technology
and Library Science have assured us that the task is doable
and we will get it done.
Last point concerns governance.
Here, I shall be brief because I'm not--
not just I'm running out of time,
but we haven't made major decisions.
For example, where should the DPLA be located
when it has offices?
Who should lead it?
To whom should it be responsible?
How will it formulate policy and administer its services?
The present secretariat is doing a good job but it won't continue
after April 2013 because it's a Harvard operation
and people love to-- I don't know if you suffer these much
at Dartmouth but they love to point the finger at us and say,
"Elitism," I mean, the number one cuss word when it comes
to the throwing around of epithets.
And it's not going to be a library for the elite;
I mean I think that advanced researchers will benefit
enormously from it.
But we're aiming this library at ordinary people.
Think of community colleges, a community college
in North Dakota or Alabama which doesn't really have a library.
We can make available to them a library that will be as great
or greater than the Library of Congress, free of charge.
K through 12 schools, retirement homes,
individuals who are just curious about things and would
like to find out more, scattered all around the country and all
over the world, this is a sort of public we're aiming for.
The public that goes to public libraries and leaders
of public libraries are part of ours steering committee
and are helping us design this.
But we haven't reached final decisions
about if you like governance.
We just know that we are aiming at a very broad constituency,
we might create a-- an independent new organization
by taking advantage of Section 501C3
of the Internal Revenue Code and setting
up a tax-exempt corporation.
At present, most people involve in this effort agree
that it should not be part of the Federal Government,
it should be free of political pressures of any kind.
It might resemble maybe the National Academy of Sciences
or perhaps the BBC, in fact, however,
it won't resemble anything because nothing
like it has ever existed, a library without walls
that will extend everywhere
and contain nearly everything available
in the walled-in repositories of human culture.
E Pluribus Unum, Jefferson would have loved it, Thank you.
[ Applause ]
>> We do have time for few questions,
[background noise] I'll let you orchestrate how [inaudible]
do this.
>> Okay, yes ma'am?
>> In the publishing world, we know that it's very hard
for many writers and editors to earn a living [inaudible].
So what-- for example, in Scandinavia
when someone places a book out of the library,
they often get some royalty [inaudible].
So what royalty [inaudible] understand
or help developers feel settled on for this process.
>> Right, can you hear me okay?
>> Yup.
>> Well, the Authors Guild is adamant
about continuing the Google suit, I mean, I could go on
and on about the Google suit but you see
after Judge Chin declared the suit unrecievable,
it reverted to the original copyright suit
and I think the publishers are going to make a separate deal,
but the Authors Guild is pushing this still to this day.
And-- so, it's very militant about trying
to protect the royalties of authors
and that's understandable.
Authors deserve royalties.
So what will we do about it, that's the question.
Well, in the case of Norway, every Norwegian has the right
to read every book in Norwegian--
in Norway and the owner of the rights is paid a certain sum
of money per page read.
That sum of money comes from the--
kind of escrow fund that is collected
by the Norwegian government and you might say to me,
"They have oil nearby."
[Laughter] And furthermore, there's something about life
in Scandinavia because there's a similar outfit in Sweden
and in Denmark and in Finland.
There's something about the sense of the public good
in these countries that I think is much stronger
than what we have here.
Still, it seems to me that we can create an escrow fund
but we must have the agreement of a representative group
of authors and of publishers to do this.
And I'm not sure how we can get their agreement.
So, we need to woo them but I actually, I mean,
I was a trustee of the Oxford University Press for 15 years.
It's a-- it's-- okay, a university press
but it's a huge press that sells a lot of books.
As some would say, it's a "Trade Press."
It's both trade and university press.
These-- it's devoted really to this spread of knowledge.
And I think a lot of publishers care
about literature that's why they went
into this not very lucrative trade in the first place.
So if we don't invade the current commercial market
and undercut them in that way, it seems to me we ought
to be able to win their support.
And we can do so by giving a reasonable loyalty
for the consultation of these books that's my hope.
>> Yes?
>> If I understood you correctly,
the academic publishing, the journals and magazines,
talked a little bit out of the site [inaudible].
>> Right.
>> And I was on Laudenbach in Germany years ago
and we had the same problem there discussing this--
particularly with digital publishing.
And the idea that came up again and again there was
if everything is already in place by academics
on pure review to publications and support them,
it's basically the academics should do the work.
And the infrastructure's also the only other hand in terms
of digital publishing.
Why can't we do it on the [inaudible]
and just bypassing the entire tremendous
and scandalous cost for publishing?
Would that be-- wouldn't that be also an aspect we integrated
into this monogamous system of security?
>> I couldn't agree more.
Now, the attempts to reverse the economics of journal publishing,
I mentioned just briefly in passing, but it does involve not
so much the DPLA although it might someday.
It really involves instead processing fees.
So, the idea is to-- for universities,
to pay for what sometimes called "Authors fees"
to subsidize articles that will go into open access journals
and often grants to scientists have a certain amount
for publication as well.
So there's hope for this.
And at Harvard, we have a program and we subsidize
up to a thousand dollars per professor.
This is beginning to spread and I'm happy to say
that Dartmouth is part of this attempt to reverse the economics
of journal publishing.
And you're right, you know, it's doable
but you probably know the story
of the Max Planck Institute in Germany.
They tried to do it, they held out for, I think it was
about 3 months, and then they collapsed
in the face of Springer.
So, it's not easy and I think it's going to take time
but it's got to work because it's so rational compared
with what we have now.
And so once we tip the balance in favor of open access,
I think that this will work
and there will still be closed accessed journals, you know,
cell and nature are not about to disappear
but they don't represent the bulk of things.
So I'm hopeful in that respect.
>> There's a gentleman back there.
>> In New England, in many places,
there is a long-standing tradition
of municipal libraries, the town library,
what might be the effect
of Universal Digital Library on the town library?
>> Yeah. Well, that's a very good question and it's one
that we care about passionately.
We want to support and reinforce town public libraries.
So, we had a debate about even using the word "public",
you know, you could call it the Digital Library of America,
and frankly, I prefer that as a term because it--
you know, there's a danger of being misunderstood.
And so, some people might feel that if we provide all
of this material free of charge
that municipalities can reduce their budgets
for public libraries.
That's not the case.
So I think what will happen especially
if we have a moving wall, such as the one I described,
is that public libraries will continue to do what they do
so well to satisfy the demand of their--
demands of their users by making available current best-sellers,
current books of all sorts, DVDs, videos, magazines,
and that the Digital Public Library will provide them
with a vast corpus of works
that were published 10 years ago and beyond.
So I think it will enrich public libraries enormously.
And in fact, we have several public librarians
on our steering committee and they agreed with this.
So we are deeply committed to helping public libraries.
>> You talked briefly about quality issues with services
like Google Books in terms
of reproduction quality and quality control.
But you also talked
about reducing cost per page per scanning.
How do you see the DPLA interfacing
with special collections preservation efforts,
and what can determine your standard or the resolution
and the quality of expense that you're introducing?
>> Yeah. It's an excellent question.
I may not have an adequate answer to it because first
of all, a lot of these digitizing
of special collections has been done on the spot.
And so the quality is assured by people like your librarian
and your rare book collections
or wherever these works maybe located.
I think it's fair to say that in general, when it comes
to digitizing special collections,
libraries take great care with them.
Certainly at Harvard, we do-- we have a huge digitizing operation
on the D-floor of Widener, it's expensive.
The quality is terrific but actually it's so good
that I think we could do with much worse quality
to get the grade bulk of books out there and not, you know,
the medieval manuscripts would be digitized correctly
at high price-- at a high price.
So I think for the special collections
that are being digitized by libraries, the quality is not
so much a problem but the DPLA probably won't have--
we might set up quality standards
but we'll have no power
to determine how the digitizing is done provided it meets
certain standards.
So I don't think that's a real a problem but it's a good point
because we want to lower the costs of digitizing.
Now, I-- maybe some computer scientist can correct me,
but the information I have from our large section of IT
in the Harvard library is the costs of preservation,
for example, are going down tremendously year after a year.
Some say by 50 percent each year.
And the costs of scanning,
scanners are now quite inexpensive.
So I think the technology is working in our favor
and that this is not going to be a major problem.
>> Maybe just one more question.
>> Mr. Darnton--
[ Inaudible Remark ]
>> Yes.
>> Can you tell us more about it?
>> Yes, that's called an espresso book machine.
[Laughter] And the idea is you print a book
in about the time it takes to get an espresso coffee.
Now, we have one actually in the Harvard bookstore
across the street from Widener Library.
So you, the user, go into the book shop
and there's a computer there and you order a title.
The order goes to a digital database.
The text is returned and downloaded on a machine
in a matter of seconds.
The machine is a wonderful glass-enclosed printing machine.
>> I saw that.
>> And you saw that it worked.
It can print the text, trim the pages, attach a paperback cover
in less than 4 minutes.
And it can do so-- the prices vary
because the publishers set the prices.
But the prices are often 8 dollars for a paperback.
That means that you can, through Print on Demand,
have access to a whole world of literature if you happen to like
to read printed books instead of to read on reading devices.
And that's an example of what I meant
when I said think the analogue and the digital are at war
with one another because here we're using great digital
electronic technology to reinforce the printed book.
And I've-- they've printed several of my own books,
I think that the Print on Demand copy is every bit
as good as the original.
So, that's-- we're doing lots
of wonderful things right now and I--
>> [Inaudible] through Oxford be available to that one?
>> That depends on how [inaudible]--
>> At Cambridge.
>> And Cambridge, sure-- [Inaudible Remark] Yeah,
especially at Cambridge.
>> And-- [Laughter]
>> I was a student at Oxford but never mind.
>> Thank you very much.
>> All right.
>> No, but there is something called
"Oxford Scholarship" online which is an attempt
to bring the back list of Oxford
within the paying power of a large public.
>> All right Bob, thanks so much.
[ Applause ]