Revision of 17th Sept (Monday) Machine learning 101 from Mon, 2012-09-17 18:53

Machine learning 對我來說一直有一種人工智能般的吸引力
但一直找不到一個實用又好玩的, 帶有例子的教學
因為下手寫些代碼才有實感呀

早一兩天我遇到一個 blog,
講的是基本的machine learning, 使用 javascript 實作, 介給大家
http://burakkanber.com/blog/machine-learning-genetic-algorithms-part-1-j...

而且花點時間翻譯一下 (太長的話會分成數段)

課程會從介紹基因算法開始。基因算法或者是我會談到的算法中最不實用的一個,但使用它作為開頭是因為它能很好的介紹「費用 (cost)」或者「誤差 (error)」的概念,還有局部甜點全局甜點﹣這些概念都是非常重要,也會出現在其他 Machine learning (ML) 的算法之中。

大自然的進化是基因算法的啟蒙者。還有其他例如人工自然網絡 (artificial neural networks)也取自生物學:進化本身便是最好的學習算法,大腦是人類所知最好的泛用解題工具。這兩個是重要的生物學上的存在,也是兩個成長最快的人工智能分枝。但當我談到基因算法的「學習算法」和自然網絡的「解題」的時候,我們會先放下自然網絡,專注在基因算法之中。

「泛用」是上面一個重要的字眼。相比其他特定的算法,你很容易便可以找到一個比泛用算法快。但這不是這練習的目的,也不是基因算法的目的。你使用基因算法不是因為你有一個複雜的問題,而是你有一個複雜的問題的問題。(complex problem of problems) 或者你有一組複雜的,無關的參數.

用兩足行走的機械化人作例子,這是一個非常困難的問題。手寫一個行走的步驟是注定失敗的,就算你能令它行走,下一組的機械人可能有不同的重量中心點,那麼先前的算法又會變得無效。而使用基因算法你可以「教育機械人去自行學習自走」而非「教育機械人去自走」

先介紹一下今天的題目:
建立一個可以重現 "Hello, world!" 的基因算法

很多的編程教學都使用 hello world 作開頭,而使用基因算法重現 hello world 看似很笨,而你直接在代碼之中寫一句 hello world 一定會更快。如果我們知道我們需要的結果,為何還要編程一個算法?很簡單,因為這是一個學習用的題目。下一個使用 PHP 的練習會令你開始明白,但我們始終需要一個開始的點

基因算法的基本是產生一組「可能的答案」然後算一下這些可能和最終答案是否足夠近似。相距太遠的會被永遠放棄。接近的會和其他溶合,或者變化一點點。這個做法是一直變化,看看我們是越來越近最終答案,還是相反的,越來越遠。

這些「可能的答案」是基因 (gene)。它們會交配,產生下一代,進化。它們會因為和目標太遠而死亡,或者優質的會交配產生下一代。你可能還想像不到這些如何可以相關到 hello world,但以下例子只是其中一個基因算法可以解決的問題之一。

一個基因是代表一個可能的答案。在這個例子中,基因是一串字符。我們可以假設字符長度為13 (“Hello, World!”),幾個可能的基因

Gekmo+ xosmd!
Gekln, worle”
Fello, wosld!
Gello, wprld!
Hello, world!

明顯最後一個是「正確」的,也是全局的甜點,最佳點。但我們可以如何計算其他基因的「費用 (cost)」?

費用函數 (cost function)
費用函數是種量度兩個基因的相似度的函數,而在這個例子,我們可以比較兩個學串中的每一個字符的話ASCII,再平方令數字一定為正數。例如字符 “A” (ASCII 65) 但目標的字符是 “C” (ASCII 67), 費用便是 4 (67 – 65 = 2, 2^2 = 4)。我們使用平方的原因是我們只想看對絕對相差,那一個大或小並不重要。所以你也可以使用絕對值函數 abs() 你可以自己試驗一下它們之間的差別

使用以上的費用函數,可以算出以下五組基因的費用:

Gekmo+ xosmd! (7)
Gekln, worle” (5)
Fello, wosld! (5)
Gello, wprld! (2)
Hello, world! (0)

在這個例子,因為問題是簡單,我們的目標是 0費用,然後便完成了。但有些情況便不同,我們只需要找到我們能找到的最低費用,然後符合某些條件之下完結,停止再尋找。或者你要找出的是最高的費用,相同的你需要一些條件去總結為「完成」。

費用函數是基因算法的重要組成部份,因為如果你聰明的話,你可以混合完全不相關的參數。這一次的例子之中只有字符,但如果你要建立一個導航應用,你需要將重量,距離,速度,交通燈等的參數,它們完全不相關但你可以將它們化為一個整潔的費用函數,而使用不同的參數比重。

Mating and Death

Mating is a fact of life, and we use it tons in GAs. Mating is a magical moment that two chromosomes in love with each other share. The technical term for mating is “crossover”, but I’ll continue calling it “mating” here, because that paints a more intuitive picture.

We haven’t talked about “populations” in GAs yet (we’ll get to that a bit later) but for now I’ll just say that when you run a GA, you don’t just look at one chromosome at a time. You might have a population of 20 or 100 or 5,000 going all at once. Just like in evolution, you might be inclined to have the best and strongest chromosomes of the population mate with each other, with the hope that their offspring will be even healthier than either parent.

Mating strings, like in our “Hello, World!” example is pretty easy. You can pick two candidates (two strings; two chromosome) and pick a point in the middle of the string. This point can be dead-center if you want, or randomized if you prefer. Experiment with it! Take that middle point (called a “pivot” point), and make two new chromosomes by combining the first half of one with the second half of the other and vice versa.

Take these two strings for example:

Hello, wprld! (1)
Iello, world! (1)
Cutting them in half and making two new strings from the alternating halves gives us these two new “children”:
Iello, wprld! (2)
Hello, world! (0)
As you can see, the offspring includes one bastard child that has the worst traits of both parents, but also an angel child that has the best traits of both.

Mating is how you get from one generation of genes to the next.

Mutation

Mating alone has a problem: in-breeding. If all you do is mate your candidates to go from generation to generation, you’ll get stuck near a “local optimum”: an answer that’s pretty good but not necessarily the “global optimum” (the best you can hope for).

(I was recently informed that the above paragraph is misleading. Some readers got the impression that the most important aspect of chromosome evolution is the mating, when in actuality a GA would achieve very little if not for the combined effects of both mating and mutation. The mating helps discover more optimal solutions from already-good solutions [many problems' solutions, like our Hello World problem, can be divided into optimal sub-solutions, like "Hello" and "world" separately], but it’s the mutation that pushes the search for solutions in new directions.)

Think of the world that these genes are living in as a physical setting. It’s really hilly with all sorts of weird peaks and valleys. There is one valley that’s the lowest of all, but there are also tons of other little valleys — while these other valleys are lower than the land directly around them, they’re still above sea-level overall. Searching for a solution is like starting a bunch of balls on the hills in random places. You let the balls go and they roll downhill. Eventually, the balls will get stuck in the valleys — but many of them will get stuck in the random mini-valleys that are stilly pretty high up the hill (the local optima). It’s your job to make sure at least one of the balls ends up in the lowest point on the whole map: the global optimum. Since the balls all start in random places, it’s hard to do this from the outset, and it’s impossible to predict which ball will get stuck where. But what you can do is visit a bunch of the balls at random and give them a kick. Maybe the kick will help, maybe it’ll hurt — but the idea here is to shake up the system a little bit to make sure things aren’t getting stuck in local optima for too long.

This is called mutation. It’s a completely random process by which you target an unsuspecting chromosome and blast it with just enough radiation to make one of its letters randomly change.

Here’s an example to illustrate. Let’s say you end up with these two chromosomes:

Hfllp, worlb!
Hfllp, worlb!
Again, a contrived example, but it happens. Your two chromosomes are exactly the same. That means that their children will be exactly the same as the parents, and no progress will ever be made. But if 1 in 100 chromosomes has one letter randomly mutated, it’s only a matter of time before chromosome #2 above turns into “Ifllp, worlb!” — and then the evolution continues because the children will finally be different from the parents once again. Mutation pushes the evolution forward.

How and when you mutate is up to you. Again, experiment. The code I’ll give you later has a very high mutation rate (50%), but that’s really just for demonstration. You might make it low, like 1%. My code makes only one letter move by one ASCII code but you can have yours be more radical. Experiment, test, and learn. It’s the only way.

Chromosomes: Summary

Chromosomes are representations of candidate solutions to your problem. They consist of the representation itself (in our case, a 13-character string), a cost or fitness score and function, the ability to mate (“crossover”), and the ability to mutate.

I like thinking about these things in OOP terms. The “Chromosome” class therefore has the following properties:

Properties:

Genetic code
Cost/fitness score
Methods:

Mate
Mutate
Calculate Fitness Score
We’ll now look at how to have genes interact with each other in the final piece of the GA puzzle: the “Population”.

The Population

The population is a group of chromosomes. The population generally remains the same size but will typically evolve to better average cost scores over time.

You get to choose your population size. I picked 20 for mine below, but you could choose 10 or 100 or 10,000 if you want. There are advantages and disadvantages, but as I’ve said a few times by now: experiment and learn for yourself!

The population experiences “generations”. A typical generation may consist of:

Calculating the cost/fitness score for each chromosome
Sorting the chromosome by cost/fitness score
Killing a certain number of the weakest members — you pick the number of chromosome that will die
Mating a certain number of the strongest members — again, you pick how you do this
Mutating members at random
Some kind of completeness test — ie, how do you determine when to consider the problem “solved”?

Google