Having a Right Measure for IR
This might be my first posting as a IR research. I just entered Information Retrieval Lab in UMass, having a busy time getting used to the life in USA while starting my career as a research.
While I have considered blogging as a good pastime activity, I decided that I may even need to do blogging for the purpose of my research. It may help me develop immature research ideas, learn how others think differently and see things from a more relaxed perspective—a research should not be considered a work to be fruitful.
As an novice blogger, I started to read what others wrote about research. The article about right measure to use drew my attention today. In most of AI-like problems, having the right measure is critical since you’d be optimizing for the wrong direction otherwise. Conversely, if only you got the right measure, you can improve your result and find how good it worked.
The author(Hal) says that F-measure(weighted harmonic mean of precision and recall) is desirable for classification problems where the problem space can be divided into answer vs. non-answer, in addition to well-known rarity reasons—accuracy is useless when one(usually non-answer) class takes up majority. This assertion seem to be semantically correct in a sense that precision and recall – components of F-measure – are defined assuming problem space division suggested by the author.
But in another posting by Chris Manning (a NLP textbook author!), the usefulness of F-measure is restricted to the cases where there are no partial-match problems—e.g. using F-measure for NER task might be problematic.
Back in IR field, I start to think about the problem of dominant measure of IR—MAP. It assumes binary relevance judgment, which is quite naive given the complex notion of relevance. As the new metrics such as NDCG are starting to be widely adopted by IR community, the limitation of MAP will become less significant.
Tags : IR,Essay Print Comments(1) Trackback