The End of Theory?

Geriatrix · Jun 26, 2008

http://www.wired.com/science/discoveries/magazine/16-07/pb_theory/#/

"All models are wrong, but some are useful."

So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don't have to settle for wrong models. Indeed, they don't have to settle for models at all.

Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.

The Petabyte Age is different because more is different. Kilobytes were stored on floppy disks. Megabytes were stored on hard disks. Terabytes were stored in disk arrays. Petabytes are stored in the cloud. As we moved along that progression, we went from the folder analogy to the file cabinet analogy to the library analogy to — well, at petabytes we ran out of organizational analogies.

At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later. For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right....

nauseous_monkey · Jun 26, 2008

erm.... no theories are brilliant. keeps us on track and in check.

I'm capped but I sense this is straight from a blog?

Picard · Jun 26, 2008

nauseous_monkey said:
erm.... no theories are brilliant. keeps us on track and in check.

I'm capped but I sense this is straight from a blog?

You are basically contesting the first paragraph. I can agree with you. But the rest of the quote regarding our concept of information is true, I think.

nauseous_monkey · Jun 26, 2008

Picard said:
You are basically contesting the first paragraph. I can agree with you. But the rest of the quote regarding our concept of information is true, I think.

Yep, that and the title of the thread.

w1z4rd · Jun 26, 2008

Yeah, the title is very misleading...

Geriatrix · Jun 26, 2008

Um, the title of the thread is the title of the article guys...

nauseous_monkey · Jun 26, 2008

You scared us there for a second. Expect titles like that in PD or posted by Teleo

Devill · Jun 26, 2008

In short, the more we learn about biology, the further we find ourselves from a model that can explain it.

There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.

If the words "discover a new species" call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn't know what they look like, how they live, or much of anything else about their morphology. He doesn't even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.

This sequence may correlate with other sequences that resemble those of species we do know more about. In that case, Venter can make some guesses about the animals — that they convert sunlight into energy in a particular way, or that they descended from a common ancestor. But besides that, he has no better model of this species than Google has of your MySpace page. It's just data. By analyzing it with Google-quality computing resources, though, Venter has advanced biology more than anyone else of his generation.

This kind of thinking is poised to go mainstream. In February, the National Science Foundation announced the Cluster Exploratory, a program that funds research designed to run on a large-scale distributed computing platform developed by Google and IBM in conjunction with six pilot universities. The cluster will consist of 1,600 processors, several terabytes of memory, and hundreds of terabytes of storage, along with the software, including Google File System, IBM's Tivoli, and an open source version of Google's MapReduce. Early CluE projects will include simulations of the brain and the nervous system and other biological research that lies somewhere between wetware and software.

Learning to use a "computer" of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

There's no reason to cling to our old ways. It's time to ask: What can science learn from Google?

Very true, most models fail in one way or the other.

LOL @ the last lines....

Nick333 · Jun 26, 2008

Lol when I saw title that I thought : "Here we go with another teleo thread".

nauseous_monkey · Jun 26, 2008

Devill said:
In short, the more we learn about biology, the further we find ourselves from a model that can explain it.

There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.

If the words "discover a new species" call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn't know what they look like, how they live, or much of anything else about their morphology. He doesn't even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.

This sequence may correlate with other sequences that resemble those of species we do know more about. In that case, Venter can make some guesses about the animals — that they convert sunlight into energy in a particular way, or that they descended from a common ancestor. But besides that, he has no better model of this species than Google has of your MySpace page. It's just data. By analyzing it with Google-quality computing resources, though, Venter has advanced biology more than anyone else of his generation.

This kind of thinking is poised to go mainstream. In February, the National Science Foundation announced the Cluster Exploratory, a program that funds research designed to run on a large-scale distributed computing platform developed by Google and IBM in conjunction with six pilot universities. The cluster will consist of 1,600 processors, several terabytes of memory, and hundreds of terabytes of storage, along with the software, including Google File System, IBM's Tivoli, and an open source version of Google's MapReduce. Early CluE projects will include simulations of the brain and the nervous system and other biological research that lies somewhere between wetware and software.

Learning to use a "computer" of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

There's no reason to cling to our old ways. It's time to ask: What can science learn from Google?

Very true, most models fail in one way or the other.

LOL @ the last lines....

I bet you believe that in some weird way that affirms your religious fanaticism

God is an pretty old idea aswell... havn't seen it being changed much.

oops.... that probably belonged in PD.

The scientific model works. get over it.

Hypothesis -> Theory

Don't forget all the steps in between

stormchaser · Jul 13, 2008

the universe is mathematical in nature,and pie doesn't end.

Turtle · Jul 13, 2008

Scientific method "obsolete"? Boy are they flame-baiting with a BS nonsensical sensationalist title like that.

I doubt they believe it themselves, but they know one thing for sure, it's provocative, and will thus generate a lot of discussion (like this), leading to many "advertising impressions" and thus $$$$.

It's horribly irresponsible though; science is already in a flimsy enough position with the public as it is, and they are profiting by ripping it even further to shreds. Like fanning the flames of Rome burning while selling popcorn to the people watching it burn.

This doesn't "end" anything, it's just a useful newish tool in the scientist's toolchest (it's not really new either). There goes any shred of respect I might've had for Wired.

Join the MyBroadband community

Get started

The End of Theory?

Geriatrix

Executive Member

nauseous_monkey

Expert Member

Picard

Guest

nauseous_monkey

Expert Member

w1z4rd

Karmic Sangoma

Geriatrix

Executive Member

nauseous_monkey

Expert Member

Devill

Damned

Nick333

Honorary Master

nauseous_monkey

Expert Member

stormchaser

Member

Turtle

Expert Member