2010年5月24日 星期一

垃圾DNA的新發現會讓人吃驚!





2009-06-25 22:46
New findings about "junk" DNA may bring some surpirses
author unknown
Abridged version

英文來源:http://www.gewo.applet.cz/health/DNA_1e.htm
中文來源:http://hi.baidu.com/james1/blog/item/3907f4002cbe188ee950cdf1.html
翻譯:蕭光航

A group of researchers working at the Human Genome Project will be announcing soon that they made an astonishing scientific discovery: They believe so-callednon-coding sequences (97%) in human DNA is no less than genetic code of an unknown extraterrestrial life form.
一個致力於人類基因工程的研究小組很快將要宣佈一項讓人震驚的科學發現:他們相信在人類的DNA中存在的所謂「非代碼」基因序列(97%)即是一種地外生物形態的遺傳代碼。

The non-coding sequences are common to all living organisms on Earth, from molds to fish to humans. In human DNA, they constitute larger part of the total genome, says Prof. Sam Chang, the group leader. Non-coding sequences, also known as "junk DNA", were discovered years ago, and their function remains mystery. Unlike normal genes, which carry the information that intracellular machinery uses to synthesize proteins, enzymes and other chemicals produced by our bodies, non-coding sequences are never used for any purpose. They are never expressed, meaning that the information they carry is never read, no substance is synthesized and they have no function at all. We exist on only 3% of our DNA. The junk genes merely enjoy the ride with hard working active genes, passed from generation to generation. What are they? How come these idle genes are in our genome? Those were the question many cientists posed and failed to answer - until the breakthrough discovery by Prof. Sam Chang and his group.
從黴菌到魚類到人類,這組非代碼基因序列在所有地球生物組織中皆常可見。小組組長Sam Chang教授說,在人類的DNA中,它們(非代碼基因)在總的基因數中佔有更大的比例。
非代碼基因又稱作「垃圾DNA」,多年前即被發現,它們的功能仍然是個迷。它們不像正常的基因那樣載有合成蛋白、酶及其他人體產生的化學物的信息, 非代碼基因序列沒有任何使用目的。它們不作表述,就是說它們承載的信息無法讀取,也沒有合成物質,它們根本沒有任何功能。我們存在於我們3%的DNA之 中。垃圾DNA只是喜歡搭在活躍的功能性基因上面,一代代地往下傳承。它們是什麼?為什麼這些閒置的基因會在我們的基因組裡?這些問題不斷地被科學家們提 出來,卻無法找到答案--現在終於被Sam Chang教授和他的小組取得了突破。

Trying to understand the origins and meaning of junk DNA Prof. Chang realized that he first needs a definition of "junk". Is junk DNA really junk, (useless and meaningless) or it contains some information not claimed by the rest of DNA for whatever reason? He once mentioned the question to an acquaintance, Dr.Lipshutz, a young theoretical physicist turned Wall Street derivative securities specialist. "Easy," replied Lipshutz. "We'll run your sequence through thesoftware I use to analyze market data, and it will show if your sequences are total garbage, "white noise", or there is a message in there."
要想明白垃圾DNA的起源及意義,Chang教授覺得他首先需要一個對「垃圾」的定義。是否垃圾DNA真的就是垃圾(無用且無意義的),或者由於某種原因 它包含了其他DNA所不具有的信息?他的熟友Lipshutz博士是位年輕的理論物理學家,現在轉行在華爾街搞衍生證券,他跟他提到了這個問題。
「這好辦」Lipshutz說「我把你的基因序列用我那個市場數據分析軟件分析一下,馬上就知道你的那些序列是完全的垃圾,還是'白噪值(空值)'或者裡頭有什麼信息。」

Working evenings and weekends, Lipshutz managed to show that non-coding sequences are not all junk, they carry information".To my surprise, the entropy ofcoding and non-coding DNA sequences was not that different", continues Lipshutz. "There was noise in both but it was no junk at all. If the market data were that orderly, I would have already retired."
Lipshutz在晚上及週末進行測試,他得以證實了非代碼序列並不全是垃圾,它們是有承載信息的。
「讓我感到驚訝的是,代碼與非代碼的熵差距沒有那麼大」,Lipshutz說,「兩者都有空值,但絕不是垃圾。如果市場信息能像這樣整齊,我可能早得退休了。」

Eventually Prof. Chang was referred to Dr. Adnan Mussaelian, a talented cryptographer in the former Soviet republic of Armenia. Poor fellow barely survivedon a $15 a month salary and occasional fees for tutoring children of Armenian nuveau riches. A $10,000 research grant was a struck of luck, he began working like a beaver.
最後Chang教授找到了Adnan Mussaelian博士,他是前蘇聯共和國的天才編碼破譯員。可憐的傢伙現在靠一個月15塊美金的工資苟活,偶爾也給富家子弟上課賺點外塊。對他來說有一萬美金的研究經費是走了財運,他像一隻勤奮的海狸,開始賣力地工作。

Adnan promptly confirmed the findings of his Wall Street predecessor: The entropy indicated tons of information almost in the clear, it was not too strong cryptographic system, it didn't appear to be a tough problem. Adnan began applying differential cryptoanalysis and similar standard cryptographic techniques.

Adnan很快肯定了前面那位華爾街夥計的發現:代碼的熵顯示出的信息幾乎是清晰的,這不是什麼複雜的加密系統,不像是很難解決的問題。Adnan開始進行差異性密碼分析及執行相關的標準密碼分析技術。

He was two months in the project when he noticed that all non-coding sequences are usually preceded by one short DNA sequence. A very similar sequence usually followed the junk. These segments, known to biologists as alu sequences, were all over the whole human genome. Being non-coding, junk sequences themselves, alu are one of the most common genes of all.
他在這個項目上花了兩個月時間,這時他注意到所有非代碼序列都以一段短的DNA序列開頭,而在這些垃圾代碼的結尾也有類似的代碼。 這些部分,生物學家都知道是ALU序列,其遍佈於整個人類基因組之中。作為非代碼、垃圾序列本身,Alu序列是所有基因中最常見的。

Trained as a cryptographer and computer programmer, and having no knowledge of microbiology, Adnan approached the genetic code as of computer code. Just playing with the analogy Adnan grabbed the source code of one his programs and fed it into the program that calculates the statistics of symbols and short sequences, a tool often used in decoding messages. What was the most common symbol? Of course, it was "/", a symbol of comment! He took a Pascal code, and it were { and } ! Of course, the code between two slashes in C is never executed, and is never meant to be executed; it is not the code, it is the comment to the code!
Adnan受過的是密碼破譯員及電腦程序員的訓練,他沒有任何的微生物學知識,他把基因代碼當作電腦程序代碼來研究。在試著類推分析時Adnan將源碼放 入短序列符號統計程序中進行分析,這個分析工作常用來破解信息。最常見的符號是什麼?當然,它是「/」號,這是一個註釋的符號!在Pascal語言裡,這 個符號是{ 和 }!當然,在C語言裡,在兩個斜槓之間的代碼永遠不會被執行,也是永遠沒有要被執行的意思;它不是代碼,它是代碼的註釋!

Being unable to resist the temptation to further play with the analogy, Adnan began comparing statistical distributions of the comments in computer and genetic code. There must be a striking difference. This should show up in statistics. Nevertheless, statistically, junk DNA was not much different from active, coding sequences. To be sure, Adnan fed a program into the analyzer: surprisingly, the statistics of code and comments were almost the same. He looked into the source code and realized why: there were very few comments in between the slashes, it was mostly C code the author decided to exclude from execution, a common practice among programmers.
無法抵擋的誘惑使Adnan更進一步地進行類推,他開始比較電腦程序註釋與基因代碼之間的統計性狀的區別。這裡頭肯定有很大的不同。在統計的結果中應該會 顯現出來。然而,垃圾DNA與活躍的代碼序列沒有什麼不同。為了確定一下,Adnan在分析中加了一個程序:驚異的是,代碼與註釋的統計結果幾乎是一樣 的。他檢查了一下源代碼,明白了原由:在斜槓之間只有很少的註釋,將其排除在執行之外,這與C語言碼的程序員通常的做法差不多。

Adnan, religiously inclined person, was thinking about the divine hand - but after analyzing the spaghetti code inside the sequences he convinced himself that whoever wrote the small code was not God. Who wrote the active, small coding part of human genetic code was not very well organized, he was a rather sloppy programmer. It looked like rather somebody from Microsoft, but at the time human genetic code was written, there was no Microsoft on Earth.
Adnan是一個有宗教傾向的人,他想到了神的創造之手---但是當分析了序列內部的編碼之後,他覺得這段編碼不管是誰寫的,這肯定不是出於上帝之手。這 些人類基因的小段有效代碼寫得不是很工整,編寫得相當隨意,就像微軟某個人寫的一樣。只不過寫基因代碼時,地球上微軟還沒出世呢。

On Earth? It was like a lightning... Was the genetic code for all life on Earth written by an extraterrestrial programmer and then somehow deposited here, for execution? The idea was mad and frightening, and Adnan resisted it for days. Then he decided to proceed. If the non-coding sequences are parts of the program that were rejected or abandoned by the author, there is a way to make them work. The only thing one needs to do is to remove the symbols of comments and if the portion between the /*......*/ symbols is a meaningful routine it may compile and execute! He selected some 200 non-coding sequences that most closely resembled real genes, stripped them of /*, //, and similar stuff and after few days of hesitation sent e-mail to his American boss, asking him to find a way to put them in E-coli or whatever host and make them work.
地球上?這想法就像一道閃電劃過...是不是這些基因碼是地外文明的編寫者為所有生命形式所編寫,然後就以某種方式存放在這裡,以備執行?這種想法 真是又瘋狂又怕人,Adnan一連幾天使勁讓自己別這麼想。然後他決定繼續。如果非代碼序列是程序的一部分,且被作者放棄或丟棄,有一個方法可以使它們執 行。唯一要做的就是將註釋的符號去掉。如果在/*......*/中間的部分是有意義的,它將會被編譯並執行!他選擇了最類似基因的200組非代碼序列, 將它們類似/*,*/的去掉,猶豫了幾天後他發郵件給他的美國老闆,叫他想辦法將這組基因植入螺旋桿菌或其他的宿主,以便使代碼運行起來。

Biologists have attempted for years to make junk sequences express, without much success. Sometimes nothing turned out; sometimes it was junk again. It was not surprising. Grab an arbitrary portion of the excluded computer code and try to compile it. Most likely, it will fail. At best, it will produce bizarre results. Analyze the code carefully, fish out a whole function from the comments, and you may make it work. Because of careful Mussaelian's statistical analysis 4 of the 200 sequences he selected, began working, producing tiny amounts of a chemical compounds.
多年來生物學家一直試著解釋這些垃圾序列意義,但沒有多少進展。有時是無功而返,有時得出來的又仍然是垃圾。這個毫不奇怪。隨手抓一把被隔離的 電腦代碼,然後又要把它編譯出來。這當然會失敗。最多它只能得出一些奇怪的結果。仔細分析代碼,從整個註釋中摸索出其功能,你還說不定能讓其運行起來。 Mussaelian選了200組序列進行了一番細緻的統計分析後,從中又選了4個,開始著手研究,結果產生了少量的化學合成物質。

"I was anxiously awaiting the response from Chang," says Dr. Mussaelian. "Would it be a more or less normal protein or something out of ordinary? The answer was shocking: it was a substance, known to be produced by several types of leukemia in men and animals. Surprisingly, three other sequences also produced cancer-related chemicals. It no longer looked like a coincidence. When one awakens a viable dormant gene, it produces cancer-related proteins. Researchers began searching Human Genome Project databases for the four genes they isolated from junk DNA. Eventually, three of the four were found there, listed as active, non-junk genes. This was not a big surprise: since cancer tissues produce the protein, there must be somewhere a gene, which codes it! The surprise came later: In the active, non-junk portion of the code the gene in question (the researchers called it "jhlg1", for junk human leukemia gene) was not preceded by the alu sequence, i.e. the /* symbol was missing. However, the closing */ symbol at the end of "jhlg1" was there. This explained why "jhlg1" was not expressed in the depth of the junk DNA but worked fine in the normal, active part of the genome. The one who wrote the basic genetic code for humans excluded portion of the big code by embracing them in /*... */ but missed some of the opening /* symbol. His compiler seems to be garbage, too: a good compiler, even from terrestrial Microsoft, would most likely refuse to compile such program at all.
「我那時一直急著等待Chang的回音,」Mussaelian博士說,「大體上來說這個是不是一種蛋白,或者是一種罕見的東西?答案很讓人驚 訝:據知這種物質只有在患上了白血病的人類及動物體內才會產生。怪的是,其他三種序列也產生了與癌症性質有關的蛋白。這個看起來已經不再是碰巧的了。當一 個人喚醒了這個潛在的活性的基因的時候,它會產生癌性蛋白。研究人員開始搜索人類基因工程數據庫,把這4種從垃圾DNA中分離出來的基因資料從中找出來。 最後,找到了4個中的3個,列明為活性非垃圾基因。這個倒不是很奇怪:既然癌組織產生蛋白,那在某處肯定有一個基因含有這個功能代碼的!後面的才叫怪:在 活性的、非垃圾的基因代碼部分,有問題的基因(研究人員稱之為「jhlg1」,意思是垃圾人類白血病基因)並不是以邏輯序列打頭。如「/*」這組符號就不 在這裡。但是「JHLG1」的結尾處卻仍然帶著"/*"。這個說明了為什麼jhlg1在垃圾DNA部分裡毫無意義,卻在正常的、活性的基因組裡發揮作用。 編寫人類基本基因代碼的那位,將大代碼用/*...*/隔離了出去,但是寫漏了開頭的那個/*. 並且他的編譯器好像也很垃圾。任何一個好的編譯器,即便拿地球上的微軟來說,也很可能拒絕編譯這樣的程序。」


Prof. Sam Chang with his students began searching for genes associated with various cancers, and almost in all instances they discovered that those genes are followed by the alu sequence (i.e. protein as a comment closing symbol */), but never preceded by the comment opening /* gene! "This explains why diseases result in cell damage and their death, whereas cancers lead to cell reproduction and growth. Because only few fragments from the big code are expressed, they never lead to coherent growth. What we get with cancer, is expression of only few of genes alien to humans and symbiosis with some genes of bacterial parasites that lead to illogical, bizarre and apparently meaningless chunks of living cells. The chunks have its own veins, arteries, and its own immune system that vigorously resists all our anti-cancer drugs.

Sam chang教授和他的學生開始尋找各種癌症有關的基因組,幾乎所有他們所發現的這類基因都是以邏輯序列結尾的(比如將蛋白作為註釋的「*/」號表示結 尾),但是卻都缺乏註釋的頭半部分的「/*」符號!這個表明為什麼疾病最終會細胞損壞並死亡,而癌症細胞卻能進行細胞複製並生長。因為只有少部分大代碼是 被意體化,它們不會協調地生長。我們從癌症中可以看到的是,只有少量異質的人類基因與一些寄生菌基因形成共生狀態,從而造成非邏輯而又怪異的,並且很明顯 是無意義的生命細胞塊。這些細胞塊有自己的血管、動脈及它們自身的免疫系統,強有力地抵禦抗癌藥物。



"Our hypothesis is that a higher extraterrestrial life form was engaged in creating new life and planting it on various planets. Earth is just one of them. Perhaps, after programming, our creators grow us the same way we grow bacteria in Petri dishes. We can't know their motives - whether it was a scientific experiment, or a way of preparing new planets for colonization, or is it long time ongoing business of seedling life in the universe. If we think about it in our human terms, the extraterrestrial programmers were most probably working on one big code consisting of several projects, and the projects should have produced various life forms for various planets. Very likely in a rush, the programmers cut down drastically the big code and delivered basic program intended for Earth. However, at that time they were (perhaps) not quite certain which functions of the big code may be needed later and which not, so they kept them all there. Instead of cleaning the basic program by deleting all the lines of the big code, they converted them into comments, and in the rush they missed few /* symbols in the comments here or there; thus presenting mankind with illogical growth of mass of cells we know as cancer."
我們的推論是有一種更高級的地外生命形態參與了這個新生命體的創造並且將其培養於各個星球上。地球只是其中一個。也許,在生命程序編寫之後,我 們的創造者培養我們就像我們在培養皿中培養細菌一樣。我們不知道他們的動機是什麼-可能是一種科學的實驗,或者是在新的星球上殖民前的一種準備方法,或者 也可能在宇宙中培育生命體是一種長期實行的慣例。如果我們在人類的角度想一下,地外的生命編寫者很可能只在一個大代碼上同時做好幾個項目,這些項目應該已 經在不同的星球上產生了各種形態的生命體。編寫者們很可能做得很急,他們把大代碼功能大量地削減,並保留了用於地球的基本編碼。不過,那時他們(可能)不 太確信究竟大代碼裡哪些是以後用得到的,哪些是用不著的,所以他們把所有的代碼都保留了下來。他們沒有用刪除的方法將代碼行清除,而是把它們全變成註釋, 在趕工的過程中他們這一塊那一塊地漏寫了一些「/*」號,就這樣使得人類體內生長出了大量我們稱為癌的非邏輯細胞。


There are three options to the problem. Either delete all the /* symbols and comments and clean this way the basic code, or add all the missing */ and avoid illogical mixing of the basic code with the big code. Alternatively, in the third option, remove all the / symbols and let work the basic code with the big code as a complete program. Unfortunately, none of these options are within our capacity. If we were able to efficiently insert genes into the chromosomes of living men, our breakthrough discovery would mean instant cure for all future cancer cases; at least from the programmer point of view. Theoretically, we can do it in a laboratory, but we have no practical means to implant the repaired DNA into living subjects. The mystery of "junk DNA" and cancer seems to be solved, but no quick cure shall be expected. The best thing we can do now is to try nourishing new, cancer-free line of humans with gradually debugged basic genetic code. That will take a long time. For us and our children, there is no hope on the horizon.
有三種方法可以解決這個問題。一是將所有的/*號及中間的註釋刪除,以此清潔(人類)基本代碼,或是將遺漏的/*號全部添加回去,以防非邏輯的 大代碼與(人類)基本代碼相混合。也可以採用第三種方法,將所有的/*符號清除,讓基本代碼與大代碼作為整體程序運行。但遺憾的是,這三種方法都不是我們 能做到的。如果我們可以有效地將基因插入到人類活體的染色體中,至少從編寫者的角度來年,這種技術突破意味著我們可以立刻治癒所有未來的癌症。神秘的「垃 圾DNA」及癌症問題看上去得到瞭解決,但不必期望有什麼速效的療效。我們能做的是儘量培養新的,帶有癌免疫的人類基本調試代碼。這要花費漫長的時間。對 於我們及我們的子孫來說,在地平線上,還看不到希望。

"However, from the programmer's point of view, there is also positive outlook in it. What we see in our DNA is a program consisting of two versions, a big code and basic code. First fact is, the complete program was positively not written on Earth; that is now a verified fact. The second fact is, that genes by themselves are not enough to explain evolution; there must be something more in the game. What it is or where it is, we don't kow. The third fact is, no creator of a new work, be it a composer, engineer or programmer, from Mars or Microsoft, will ever leave his work without the option for improvement or upgrade. Ingenious here is, that the upgrade is already enclosed - the "junk DNA" is nothing more than hidden and dormant upgrade of our basic code! We know for some time that certain cosmic rays have power to modify DNA. With this in mind, plausible solution is available. The extraterrestrial programmers may use just one flash of the right energy from somewhere in the Universe to instruct the basic code to remove all the /*…*/ symbols, fuse itself with the big code ("junk DNA") and jumpstart working of our whole DNA. That would change us forever, some of us within months, some of us within generations. The change would be not too much physical, (except no more cancers, diseases and short life), but it will catapult us intellectually. Suddenly, we will be in time comparable to coexistence of Neanderthals with Cromagnons. The old will be replaced giving birth to a new cycle. The complete program is elegant, very clever self-organizing, auto-executing, auto-developing and auto-correcting software for a highly advanced biological computer with build-in connection to the ageless energy and wisdom of the Universe. Software wise, within us is either short and diseased life, or potential for a super-intelligent super-being with a long and healthy life. This triggers puzzling questions - was the reduction to the basic code done by sloppy programmers in a rush (as it appears to us), or was the disabling of the big code purposeful act which can be cancelled by a "remote control" whenever desired?"
「不過,從編寫者的角度來看,仍然是有其積極的一面的。我們從我們的DNA中可以看到,它是由兩個版本組成的:基本的人類代碼及大代碼。首要的事實 是,完整的代碼絕對不是在地球上完成的,這是經過確認的一件事。其二、基因本身不足以說明其進化性;這裡頭肯定還有更多的內涵,內涵是什麼,在哪裡,我們 不得而知。其三、參與新項目的創造者,不管是編寫者,工和師或是程序員,不管是在火星還是在微軟,他們都會為其後的改善及升級預留餘地。這裡巧的是升級程 序已經被包含在裡面了--就是「垃圾DNA」本身就是隱含的及潛在的使我們基本代碼升級的程序!我們已經知道某種宇宙射線有能力改變我們的DNA。知道了 這個,有就有令人稱道的方案。地外的代碼編寫者可以只消用一束相關的能量,在宇宙的某處就可以讓基本代碼將所有的/*號移除,將整個大代碼(「垃圾 DNA」)融為一體,一下激活我們所有的DNA。此舉將會永久地改變我們,我們有的人會在幾個月,有的人會在幾代人的時間內改變。這種改變在形態上不會有 很大變化(只是沒有了癌症、疾病及短促的壽命),但會使我們的智慧突飛猛進。突然之間,我們會暫時有一個類似於石器時代尼安特猿人與(古 石器時代)克魯麥農人共存的階段。老的循環會被更替,產生新的循環。整個程序是一套為高度生物電腦準備的帶有內嵌的永不老化的能量及宇宙智慧的軟件,其性 能優雅、非常聰敏而又能自我調節執行、自我進化自我糾正。而我們現在的則是短促多病的生命代碼,或者說是具備超級智慧、長壽健康的超級生命體潛力的生命。 這就引發了一些令為迷惑的問題--基礎代碼的刪減是因馬虎的編寫者倉促所為(我們看來),還是有意將部分大代碼功能廢除,卻可以在任何時候在需要時通過 「遙控」將其取消?

Soon or later, we have to come to grips with the unbelievable notion that every life on Earth carries genetic code for his extraterrestrial cousin and that evolution is not what we think it is. This discovery may well shake the very roots of humanity - our beliefs in our concept of God and in our own power over our destiny. With the right paradigm, we may discover one day that all forms of life and the whole Universe is just one huge intellectual exercise in thoughts expressed mathematically, by Design, by Creator.
我們遲早會瞭解,每個地球的生命體都有著地外族人同樣的基因代碼,而進化並不是我們所想的那樣,這是種令人難以置信的觀點。這個發現或許會撼動 人性的根基--我們的信仰中意識形態的上帝,及我們自身凌駕於命運之上的能力。只要模式沒錯,某天我們會發現所有的生命形態及整個的宇宙只是一整個巨大的 設計或創造者智慧的思想的數學實踐。