The Scientific Method

Tycho Brahe was perhaps one of the greatest observational astronomers in history.  He amassed a tremendous amount of absolutely critical data on the motion of bodies within our solar system.  Interestingly, Brahe never accepted the Copernican heliocentric view of the solar system.  For years, he was incredibly protective of the data, refusing to share it with anyone.  Given that he (with historical hindsight) was wedded to a dead-end view of the solar system, his data was not initially valuable.

It was not until Keppler, and later Newton and others, were able to get access to his data that the data was truly useful, and it became the foundation of one of the greatest revolutions in thinking and understanding in human history.  Had Brahe insisted on the confidentiality of his data to his death, developed as it were with significant financial contributions from the state and various universities, his work would have been irrelevant, applied narrowly to support a failed theory.

I am reminded of this story when I read this:

In a landmark ruling, the UK Information Commissioner’s Office has ruled that Queen’s University Belfast must hand over data obtained during 40 years of research into 7,000 years of Irish tree rings to a City banker and part-time climate analyst, Doug Keenan.

This week, the Belfast ecologist who collected most of the data, Professor Mike Baillie, described the ruling as “a staggering injustice … We are the ones who trudged miles over bogs and fields carrying chain saws. We prepared the samples and – using quite a lot of expertise and judgment – we measured the ring patterns. Each ring pattern therefore has strong claims to be our copyright. Now, for the price of a stamp, Keenan feels he is entitled to be given all this data.”

I guess I am confused as to what the point of a public university is, if its not to contribute knowledge to the public domain.  Had Mike Baillie formed his own company with private investors to gather and monetize tree ring data, he would be absolutely correct, and I would be the first to defend him.   I would love to see the grant application or funding proposal he submitted for this work.  “We would like public funds in the amount of X to gather tree ring data and keep this data absolutely secret so that no one can check or replicate our results.”   Actively fighting replication is not a very positive indicator of confidence in one’s scientific results.

    A commenter at Bishop Hill’s blog suggested that Queen’s should have offered Keenan a post and invited him to analyse the data in co-operation with Baillie. It’s a pity they didn’t think of that.

    Surely the value of the data is increased the more scientists get to analyse it. It isn’t as if Baillie is actually going to be deprived of the data by passing it onto someone else. Also if his work was publicly funded even in part it would appear that he has no case.

    I vaguely recall a story about Carlsberg brewery managing to isolate a single yeast cell and developing a yeast culture from it that produced a superior beer. They really did have a case for keeping it to themselves and thus stealing a march on their competitors. They considered however, that brewing good beer was more important than profit and gave away a sample of the yeast to all the other breweries.

    “We are the ones who trudged miles over bogs and fields carrying chain saws. We prepared the samples and – using quite a lot of expertise and judgment – we measured the ring patterns.”

    So, properly speaking, he argues the data should actually belong to his graduate students (and whatever other menial laborers were employed in this effort). His protests amount to nothing more than muddle headed whining. If the research was publicly funded, then its public property, plain and simple. One has to wonder whether his objections are based in a misguided self-interest (in whatever fiduciary value might accrue to the data, for example), or merely a reflection to isolate any findings from criticism or questioning by other parties. Whatever the case, it’s a pretty sad statement on the motivations, character and clear thinking of current academia.

    “I guess I am confused as to what the point of a public university is, if its not to contribute knowledge to the public domain. Had Mike Baillie formed his own company with private investors to gather and monetize tree ring data, he would be absolutely correct, and I would be the first to defend him.”

    I don’t entirely agree… or, I don’t think it’s quite that simple.

    The university, as a legal entity distinct from the government, would have an ownership right to all of the IP it generates, regardless of whether the research was funded from the institution’s own revenue or from government grants.

    Consider the following scenario:
    1. A university receives a government grant for pharma research.
    2. The ensuing research leads to development of a new patented drug.
    3. A pharma company contracts with the university, paying it a revenue royalty in exchange for the exclusive right to market the highly profitable drug.
    4. A generic drug manufacturer demands that the government revoke the drug patent on the grounds that since the research was publically funded, the IP should be in the public domain and the drug should benefit everyone rather than generating monopoly profits for a single business.

    Would you agree with the generic drug maker here? I doubt it… and even if you did, there would be plenty of established legal precedent that the IP should not become public domain.

    “I would love to see the grant application or funding proposal he submitted for this work.”

    This, in my opinion, would be a key issue. If the contract for the government grant required that the results of the research be made public, I’d agree entirely.

    All of the above, however, is based on the unproven assumption that the tree ring data constitute intellectual property, the legal definition of which I cannot claim to fully understand.

    In the practical world, if you’re a scientist and you want your research to be taken seriously, restricting access to your data isn’t likely to help your cause.

    Let me add one other suggestion.

    The IPCC, or any other government-sponsored entity, should require the following condition of contributing scientists: If you want your research included in our reports, you have to make your data public.

    End of problem.

    There is more here than public versus private ownership rights. The tree rings are used to support government initiatives that will have dramatic public and private consequences. We have a right to see the evidence. If this were really about tree rings – there would be very little attention.

    Russ R:

    I think you are making a category mistake.

    What should be made public is the publicly funded research itself, not the drug that resulted from it.

    Russ R, Remember the Golden Rule of the Arts & Sciences – Them that has the gold, makes the rules.

    It seems ever more evident to me that there is a huge and ongoing movement to confuse academia with science.
    If someone wishes to surround some research with arcane rituals and blessings and purifications and calling them peer review and publication, while withdrawing the hem of the robe from the populace, then I say – let the research be done by commercial public tender, with the results published for all to see.
    We have no need of these sneering elites in their ivory towers, let us be done with them.

  • “There is more here than public versus private ownership rights. The tree rings are used to support government initiatives that will have dramatic public and private consequences. We have a right to see the evidence. If this were really about tree rings – there would be very little attention.”

    I agree 100%. Anyway, as someone else pointed out, if you were a scientist with compelling evidence that humanity needs to act NOW to prevent disaster, the obvious thing to do is to share that evidence immediately and without holding anything back.

    Hey Skipper:

    Fair point, there may well be a distinction to be made between the tree-ring data in question and the ownership of a drug patent (the definitions of Intellectual Property aren’t always very clear).

    While my analogy may not be entirely apt, I still can’t agree that everything funded by government grant should automatically be public domain (unless that was explicitly part of the grant agreement).

    The profit motive still drives a lot of research and innovation, and a system where any amount of government funding resulted in a loss of economic ownership and control would have a serious impact on the effectiveness and productivity of those grants (i.e. misaligned incentives, adverse selection, etc.).

    That said, I’d be much happier if there were no government research grants in the first place.

  • The good professor clearly has no case. The tree ring data was gathered at public cost for the benefit of the public.

    The public has the right to it. If he had any confidence in his work he wouldn’t mind but his reluctance to share his work makes me think it may be faulty.

    I am an engineer and I signed an agreement that any patents generated by my work belonged to my employer.

    I am certain he signed a similar agreement. If he didn’t he should have the option of signing or resigning.

  • PS: There’s one “P” in “Kepler”. Otherwise, outstanding.

    As for Dr Baillie, was his work financed or underwritten by the government? Here in the U.S., it usually is, and the results are public domain (unless otherwise specified – and it sometimes is). As ADiff points out

    More than that, how can the research be peer-reviewed unless other scientists examine the data?

    The fact that a study was peer reviewed doesn’t mean that the reviewers examined the original data. Peer review only addresses the following issues:

    Was the research original/signifigant?
    Are there any gross errors in methodology?
    Are there any issues that need to be clarified/expanded upon for publication?

    Peer review of a study is not an independent verification, merely a quick check to make sure (hopefully) that no really bad errors or baddly written papers make it to publication.

    I believe that there is an EU REQUIREMENT that ENVIRONMENTAL DATA is PUBLIC!!!

    Turns out that FOI’s to CRU and similar institutions get processed a lot faster and with less problem if requested under this rule instead of FOI!!!

    For those claiming rights for the University to their Research, they shouldn’t have voted in all those Socialists!!


    Doesn’t the tree’s right to life, liberty, and the pursuit of CO2 trump Mike Baillie’s claim to copyright?

    Re. Hey Skipper/Russ R.

    Just want to clear up the difference between this case and the drug example – I work for a drug company and we do sometimes collaborate with academics. A novel drug is classed as an invention, and as such is patentable. A patent is unlike copyright (eg. for a novel or photograph) in that it must be applied for (and paid for). To qualify it must be non-obvious (ie. require invention) and not in the public domain – you cannot patent something that is already known or that you have previously disclosed. A patent grants exclusive rights for a limited period to exploit the invention commercially. In return, full disclosure of the invention is required. Once a drug patent is published, anyone can see what it is and how to make it, but they may not make it to sell it. I doubt that tree-ring data is patentable, and even if it was, it would have to be disclosed, it would just mean that no-one else could commercially exploit it. Copyright would be a different matter, we are all entitled to keep our personal photographs, diaries, etc. confidential – but why would a scientist want to keep data, once results are published, confidential?

    It would appear from reading the University’s response in the Guardian article, that the issue to them is not “what” is released, but rather “to whom” it is released and to what degree of inconvenience they are subjected. Their objections to release of data include:”…because it would be too time-consuming; because the data does not amount to environmental information; because the research is unfinished; because the data is private property, commercially confidential and of “negligible” public interest – and because Keenan would not understand them.”
    Sorry. I work for a publicly-funded organization and everything that is submitted to our files becomes public domain. The only exception is where the information contains private, personal information or proprietary design information (ie. copyright/patent info). Environmental data does not fall under this category. Every single piece of environmental monitoring data is public, regardless of who collects it or who pays for it. We do not make decisions based on what is convenient for us and we do not judge the credentials of who comes through the door requesting information.
    Apparently, freedom of information is just a burr under Mr. Baillie’s saddle, as he is quoted as saying that the ruling is “a direct, and unpleasant, off-shoot of the information revolution. It now appears that research data can be demanded, and indeed obtained, by anyone.” Oh, gosh darn it all anyway.
    What is unsettling here is that simply because he had the chainsaws and the people and the time to go out and collect this data, trudging through the mud on top of it all, also makes him the most qualified to interpret the data. Although I don’t dispute Mr. Baillie’s credentials, it is tantamount to Neil Armstrong and Buzz Aldrin saying they own the moon rocks, can decide what they tell us and that no one else has the right to analyze them.

    Thanks for the great post(s). I too have mulled over the analogy between Brahe/Kepler and AGW settled science/skeptics. Yes, Brahe was officially supported, well-funded, and although an excellent and honest observationalist; was absolutely clueless in rejecting the heliocentric theory, and nothing of a mathematician compared to Kepler. Kepler was a monstrous genius – his laws of planetary motion an almost superhuman feat of pure brilliance and persistence. Kepler was dirt-poor and supported by practically no one. He patiently waited years for Brahe to share. Reading and understanding the Brahe/Kepler story makes you want to scream at the pettiness and injustice.

    Roll the tape forward to the 21st century…hmmm…not much has changed it seems.

    Unless and until the data and details (codes, math, etc.) associated with a particular piece of research are made freely available, that piece of research should be considered hearsay and nothing more.

    Nobody can critique or support a conclusion without full access to the inputs used to support that conclusion. Anybody can say whatever they want about secret data and secret algorithms, but thinking people should simply call those conclusions bullsh*t and ignore them.

    The fact climate ‘science’ argues that only the priesthood should be permitted to see what’s in the magic box say about as much as you want to know about climate ‘science’.

    Well, I understand a bit the issue of those scientists.

    I can remember when I was doing my PhD (on a totally different topic), most data was “dirty”, i.e. not sufficiently ordered/commented for anybody except me to use. That would be too time-consuming to make sure any data is cleaned and commented appropriately, we would get no work done. However, processed data which is the base for the theory/case studies presented in the publications is usually ordered and commented – i.e. may be provided to the people checking your paper (mostly colleagues and co-authors) and potentially (never happened to me) to reviewers.
    I can imagine that the raw tree ring data is a bunch of a zillion text files with non standard names and format (because with experience the format evolved, but reformatting old data would again be a waste of time). Only the resulting compounded information (e.g. the indicator as computed by for instance the average of all tree rings for a given year) and some statistics about it (e.g. number of measures for that year) are then in a state allowing distribution; and that is NOT what other people request, they want those raw data…

    Now in a perfect world, every bit of data would be in a clean and perfectly documented state. I am not proud of having sometimes a mess. But even when founded by a government, the money is not that big and there is a time pressure, so only the required steps are done.
    Result: It will costs many months of unproductive (i.e. without resulting publication) work to deliver data in a good state. And if you just give out whatever you have without cleaning everything, obviously people will think there are many mistakes (because they did not follow the thoughts of the people involved day after day with evolving ideas and objectives in the data collection and will compare “pears with apples”) and attack, resulting in more wasted time.

    I hope to make it clear why it is a pain in the … to answer such requests. The more so if you know that the requester is actually looking for any mistake to attack your work (and there is always some mistakes. Always.)

    That said, I do not support hiding info. I also agree that IPCC or other supranational/national organization should push (and finance the extra costs!) for an open publication of raw data for policy-critical papers.

    Just wanted to add: a junior scientist (PhD student) is well paid in my country (Switzerland) in comparison with many other countries. We used to (that was ca. 10 years ago) get about 2000$/month (in a country where the lowest salary, e.g. a supermarket cashier, is 3000$/month). That means we could live with it, if barely, and do not depend on family.
    But that’s not really luxurious and we would riot if we would be forced to spend 6 months or so (on a 3-4 year thesis) extra time to clean up and comment every bit of data under those conditions.

    So if you want all data to be distributed, you have to hire people for managing raw data. That’s costs. Say one guy for 10 researcher? But a guy with a real salary because he will not work for “papers and thesis”. So say 20%-30% increase in your science spendings? (just an idea).
    And what of the benefits? In theory you hired the professor because he is supposed to be the expert, so the best guy to interpret his data. Distributing the data will in 99.9% of the cases be totally unuseful, nobody will use it (if the topic is not trendy) or nobody will be able to make a better use of it. Only in few cases some important alternative interpretation may come out. Is that an effective way to spend your money?

    I leave the question open… but remember that everything has a cost, including transparency and that every time the benefits should be compared to the costs… It is a bit simplistic to assume that of course all publicly financed research must be publicly available down to the last raw data without understanding the burden.

    My favorite story about Tycho Brahe: While a student, he engaged in an argument with a classmate over which was the better mathemetician. A brawl ensued during which the other guy bit off Brahe’s nose. Thenceforth, he wore a metal (silver?) similacrum over the injury. Echoed by Lee Marvin in a movie (can’t recall the title) and the Prince Of Pop.

    I understand the situation, but its not relevant. If your research can’t be replicated, tested, etc., then whatever conclusions you arrive at are irrelevant. Whats the difference between fabricated data and data which is poorly organized, cherry picked, ‘groomed’, adjusted, ‘quality controlled’, and so on? Intent? I seem to recall the handling and recording of data was a big part of undergraduate lab work. So, I’d say that if you have data that is a ‘mess’, then the research is crap. Garbage in garbage out – or not. Maybe the data completely confirms your conclusions. It doesn’t matter, if you can’t test it because you can’t access the data it ain’t science and the research should not be permitted to be published. But that would impede grants, tenure, etc., which are driven by quantity, not quality.

    Re: Mingy
    In theory I agree. In practice, as you said, you need 2, 3, 4 papers to get your PhD. Nobody mentioned anything about “good” papers :-).

    More seriously – a publication contains:
    (a) a “lab” procedure (we cut trees and measured rings), (b) a descirption of the data handling (we computed the date of the ring using method xx. We then averaged all ring width for a given year), (c) results (here is the resulting plot) and (d) discussion/conclusion (as tree ring width is proportional to temperature [see ref…], we can get past climate info).

    The part of science that can be tested is basically the whole procedure – do I agree with the way they cut tree, the way they measure rings, the way they date them, the way they average them, the way they compute temperatures from width. This is normally fully published.
    I can also replicate it by chopping my own trees.

    Simply replicating the analysis on distributed raw data is a possibility, but it is dangerous. The one that got the data may know that a given tree was twisted, hence the rings assymetric. Or any other million small hints that you see when you’re in the field/lab. Just taking raw data and missing the reality check that you get with first hand data gathering is therefore dangerous. Again unless the raw data are “prepared” with the objective to make them available for third parties, with associated costs (and basically another paper that only describe in more detail step (a) above).

    Just to be clear, one of the tenets of science is testability, replicability, and of results and whether those results can be falsifieable.

    Lets say you publish a paper based on ‘your’ tree ring data. I have no access to ‘your’ tree ring data. I march our and core my own trees. I apply my adjustments to ‘my’ data (which nobody else is allowed to see). Perhaps I do so based on my interpretation of your published paper. Chances are pretty slim I’m going to arrive at the same results as you. Setting aside coincidence, if we do have similar outcomes, this could mean i) the theory is more or less correct OR ii) we have similar biases with respect to the outcome.

    These biases are not necessarily going to be reflected in shennanigans like ‘hide the decline’. They could be inherent in the process by which we gather, select, and process the data. If the data gathering system is inherently biased, it is not necessarily the case that such a bias can be detected through the processed data.

    As I have said, it is pretty well known that most peer reviewed scientific papers are subsequently shown to be flat out wrong, trivial, untestable, etc..

    The way it works is that if a ‘well repected’ scientist (i.e. one who has published a large number of non-contraverisal and oft cited papers, which we can assume by statistics alone are likely to belong to the ‘crap’ pile) publishes a paper with processed data, methodology, conclusion, etc., it is assumed that there is no need to check the results – most of which were prepared by underpaid, considerably less experienced and reputable grad studuents who know one thing above all, and that is the desired outcome.

    This is all fine and good. Scientists and wannabee scientists cranking out vast volumes of reseach based on proprietary data which is too difficult to share. If they were dealing with a typical branch of science, then good on them. Nobody cares whether a nuanced change occures in the metabolic pathway of a mitochondira except extreme specialists (though, actually, you’d better make you data available in those cases). If your wrong, somebody will show you are wrong.

    Climate ‘science’ is different. They are screwing with our lives. Grinding out vast quantities of scientific trash which all come to the correct and only funded conclusion is pretty dangerous, don’t you think?

    Would you take a drug if the raw data associated with its safety and efficacy were considered proprietary and confiendtial because they had to be groomed and selected and processed and where, in an event, in such a shambles that it wasn’t cost effective to share them?

    “So if you want all data to be distributed, you have to hire people for managing raw data. That’s costs. Say one guy for 10 researcher? But a guy with a real salary because he will not work for “papers and thesis”. So say 20%-30% increase in your science spendings? (just an idea).”

    Over here, that’s what undergrads are for. You pay them in college credits or less frequently at near minimum wage rates (here maybe $6-8/hour). They are then over seen by the PI, Post doc, or grad student for maybe 1 hour a week. So either one week’s worth of work is often only at the cost of the grad student’s hour (here at maybe $20/hour once you include the cost of tuition). And in the end this is all still payed for by public money. This is just part of the cost of doing research. You simply have to store your data in a well documented fashion that allows for outsiders, or say the next undergrad or grad student, to understand wtf you’ve been doing. If that is not possible you might as well not have done the work in the first place. So cost is irrelevant.

    I found the following paragraph most interesting. It was from Doug Keenan’s report on his adventures with QUB. It seems not only won’t they release the information to him, they won’t even let it be analyzed by anyone else. Right now, it’s unpublished and unanalyzed. But as far as they are concerned their job was only to collect it and store it and as their lab is now closed, that’s the end of the story. Is this even data in any meaningful sense of the word?

    “It is notable that QUB continues to withhold its data even though, in 2009, the tree-ring laboratory at QUB was effectively closed. The closure was primarily due to the lab lacking funds, which presumably resulted from having almost no research publications (i.e. the lab had not been producing anything; so funding agencies declined to support it). The dearth of publications occurred even though the lab has some very valuable data on what is arguably the world’s most important scientific topic—global warming (as outlined here). This problem arises because the QUB researchers do not have expertise to analyze the data themselves and they do not want to share their data with other researchers who do.”

    So we have a public university doing scientific work for a public funding agency on question of world-shaking importance who refuses to produce anything of value?

    Prof. Baillie’s refusal to share his data is understandable. Nobody would be willing to hand over what he has devoted for years to some strangers for free. But I don’t think this morality can apply to academia. It doesn’t matter whether he’s from a public or a private university. If now we are accusing him only because he works in a public university, then what if he moves to a private one? He will be justified to refuse sharing the scientific work that might make a huge impact in the whole world?

    Please excuse my English. I’m not a native speaker.

    Cat Ballou, 1965.

    I don’t care what the challenges of documentation are. And the legal distinction between patented invention (particular use of data) and public knowledge are. (For example, we all know that photovoltaic systems can garner DC voltage from sunlight. What can be patented is the particular brand of PV system you wish to sell.) When you dealing with science that affects public policy and expect the support of the public, you’d better make your data public. Unless you wish to dictate from on high. We’ve overthrown dictators before. And will, again. Really, look at history, it’s right there for anyone to see. Any government that sought to rule from behind closed doors failed. Without exception.

    An scientist couldn’t patent the process of conduction through a doped semiconductor (transistor or diode) but he could patent his process for making one. And most any true scientist can’t wait to publish and have it stand the rigors of independent examination. It puts a few feathers in his cap. Most scientists are not wealthy and their reputation is all they have.

    Sure, anybody is free to keep proprietary information. There may or may not be issues of who owns what, depending on funding. If a researcher at a public institution does work which is partly or completely funded by other interests does the data belong to the scientist, the public, or proportionately to all those who funded it. I don’t know. I believe public funding puts that data in the public domain – but different laws may have different interpretations in different countries.

    What I do firmly believe is that if the data is not available then the ‘scientific conclusions’ should be ignored. Simple as that. If you have a scientific theory or study or whatever based upon purported data and the data cannot be verified or cross checked or independently processed, then, nobody has a basis upon which to assume the paper has any validity.

    Very simple: keep your data till death but your ‘science’ has no value.