Posted on April 28, 2008 by Bin
Posted on April 27, 2008 by Bin
Posted on March 7, 2008 by Bin
Things start from my email which sent to our group mail list on an interesting passage as following
Don’t delete this just because it looks weird. Believe it or not, you can read it:
I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid Aoccdrnig to rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae.The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Amzanig huh?
Then people in our group began to post interesting comments on this passage
Read more »
Filed under: Research | Leave a Comment »
Posted on March 6, 2008 by Bin
Gaussain distribution for univariate random variable 

Gaussain distribution for D-dimensional random variable 
Conditional distribution:

where

where 
Marginal distribution


Let
, and
,
Marginal distribution of
,

Conditional distribution of
given 

where 
Filed under: Research | Leave a Comment »
Posted on October 17, 2007 by Bin
nonparametric method which directly produces resampling weights without distribution estimation. Distribution matching; the means of the training and test points in a reproducing kernel Hilbert space are close.
The author even claim that their method can in some cases outperform reweighting using the true sample bias distribution. But why it is possible?
support of Pr’ is contained in the support of Pr
two major drawbacks: 1. Pr and Pr’ should be good. 2. overkill
Core theorem: if we find the solution of the following problem, then \beta is the resampling weight we needed.
Filed under: Uncategorized | Leave a Comment »
Posted on October 16, 2007 by Bin
Several helpful links to this topic:
Filed under: Research | Leave a Comment »
Posted on October 15, 2007 by Bin
This is a nips paper in this year. Different from what we do about finding a good representation for domain adaptation, the authors want to propose an evaluation method for the representation and propose the question “under what conditions can we adapt a classifier trained on the source domain for use in the target domain?”.
As we known, it is critical point in transfer learning that how to measure the difference between two domains. In this paper, A-distance is proposed for this metric. A-distance is different from the usual distances of two distributions such as K-L divergence that it only care about the difference of probability on some subsets in the probability space. The definition is as following:

Sometimes it may be too strong to define the distance based every points as in KL-divergence. Therefore, A-distance may be a better choice.
When actually compute the A-distance, it is converted to a binary classification problem of discriminating source domain and target domain. The idea is straightforward, if two distributions are similar, it would be hard to classify the samples drawn from them, vise versa.
The paper provided a bound on the target domain error based on the assumption that there exist a good classifier in the chosen hypothesis set. The bound is controlled by the classification error on source domain data and the A-distance of the two distributions. It seems reasonable.
General speaking, this paper is interesting. However, it lucks some details in mathematics.
Filed under: Readings, Research | Leave a Comment »
Posted on October 9, 2007 by Bin
I learned a trick to convert a minimax problem into a minimum problem today. That is, if we have the problem of
![\max_{x\in F} (\min_{i\in [1,n]}x_i) \max_{x\in F} (\min_{i\in [1,n]}x_i)](http://l.wordpress.com/latex.php?latex=%5Cmax_%7Bx%5Cin+F%7D+%28%5Cmin_%7Bi%5Cin+%5B1%2Cn%5D%7Dx_i%29&bg=ffffff&fg=333333&s=0)
Then it is equivalent to the following minimum problem

This is a very useful trick. Consider the problem of data selection in SVM. We’d like to select the subset of data which maximize the margin as well as the linear separator minimize the margin.
SVM:
minimize
, subject to 
Subset SVM:

Therefore, this problem can be converted to

But is it an easier problem to solve? I’m not sure at present.
Filed under: Uncategorized | Leave a Comment »
Posted on October 7, 2007 by Bin
Although SVMlight supports giving each example a different weight in the cost function, it does not provide the interface in its Web site and related documents. But you can find it in its code. The interface is easy to use. You just need to change the input data file format as:
<line> .=. <target> cost:<weight> <feature>:<value> <feature>:<value> … <feature>:<value> # <info>
The weight by default is 1, of course.
Filed under: Research | Leave a Comment »
Posted on July 17, 2007 by Bin
Hal Daume III gave an excellent tutorial on Bayesian Techniques for NLP at MSRA. Although the tutorial’s name is about NLP but actually it is all about graphical models. In this tutorial, Hal showed many detail usually you cannot read directly from the papers. He also gave us some vivid illusion of abstract ideas such as Metropolis-Hastings Sampling and Gibbs Sampling etc.
Why the accept probability in Metropolis-hastings Sampling is:
? Although there is mathematic reason why the formula is like above, but the following analysis give us more intuition why it is so.
At current step where we have sample x, suppose we get x’ from
, the question is should we accept this sample? If p(x’) is high, it indicates it is probably a good sample for distribution p, therefore we’d better accepted. If p(x) is low, it indicates the current point is not a good stand point to generate samples for p, thus we’d better move to the new sample x’. If q(x’|x) is high and q(x|x’) is low, this means it is very likely to transfer from x to x’ but never come back, then probably it is to risk to transfer.
Filed under: Uncategorized | 1 Comment »