Notes on Computing Classes and Classification in NLP
Notes from a lecture in IN2110 at the University of Oslo about classes and classification
This article was written to learn more about classes and classification. If you enjoy this article let me know, and if you see anything lacking then please tell me. Whenever I quote ‘according to lecture’ I am referring to the lecture at the University of Oslo from the course module IN2110. I previously wrote an article called Vector for information retrieval from the same course module, and this is a follow-up on that article. I am currently looking into natural language processing and foundational theory or practice within this field (also known as computational linguistics).
Vector Space for Information Retrieval
Notes from a lecture in IN2110 at the University of Oslo about document vectors
In a vector space model objects can be represented as points.
This can be used for the practical purpose of querying documents.
Classes can correspond to collections of points.
These can be called regions.
Vector space classification is based on the contiguity hypothesis.
“In cognitive science, association by contiguity is the principle that ideas, memories, and experiences are linked when one is frequently experienced with the other. For example, if you constantly see a knife and a fork together they become linked (associated).”
In this sense objects in the same class form a contiguous region.
A region of assumed similarity based on prior occurrences or assumptions overall in the vector space.
In the lecture it was said:
“Objects in the same class form a contiguous region, and regions of different classes do not overlap.”
Decision boundaries in computing is the space that separate classes.
→ How do we choose to represent classes?
Boundaries are influenced by this question.
There are several ways to represent classes.
One is exemplar-based.
“Exemplar theory is a proposal concerning the way humans categorize objects and ideas in psychology. It argues that individuals make category judgments by comparing new stimuli with instances already stored in memory.”
As such exemplar theory assumes a similarity-based categorisation strategy.
Every stored instance of a group can potentially represent the class.
This is used in ‘instance based or memory based learning’ (MBL).
A memory-based learning system is an extended memory management system that decomposes the input space either statically or dynamically into subregions for the purpose of storing and retrieving functional information.
In the lecture it was written:
“Another variant is to use medoids, — representing a class by a single member that is considered central, typically the object with maximum average similarity to other objects in the group.”
Medoids: representative objects of a data set or a cluster with a data set whose average dissimilarity to all the objects in the cluster is minimal.
There are different ways of representing classes.
It can be centroid-based.
Centroid-based: the average, or the center of mass in the region.
One can compute the class centroid µi as:
Given a class Ci , where each object Oj being a member is represented as a feature vector Xj .
Both centroids and medoids represent a group by a single prototype.
→ Medoid: actual member of the group.
→ Centroid: abstract prototype; an average.
Typicality: can be defined by a member’s distance to the prototype.
“Typicality degrees are defined to build prototypes that characterise data subcategories, taking into account both the common points of the category members and their distinctive features as compared to other categories.”
It is possible to weight centroids, according to lecture:
“Let each member’s contribution to the average be determined by its average pairwise similarity to the other members of the group.”
How does one represent typicality?
There are discussions on this within linguistic and psychological prototype theory.
One way is Rocchio classification.
“The Rocchio algorithm is based on a method of relevance feedback found in information retrieval systems which stemmed from the SMART Information Retrieval System which was developed 1960–1964. Like many other retrieval systems, the Rocchio feedback approach was developed using the Vector Space Model.”
This uses centroids to represent classes.
The Rocchio classification is also known as the nearest centroid classifier.
The Rocchio decision rule:
- To classify a new object Oj (represented by a feature vector Xj );
— determine which centroid µi that Xj is closest to,
— and assign it to the corresponding class Ci .
There is a decision boundary of the Rocchio classifier.
Defines the boundary between two classes by the set of points equidistant from the centroids.
“A point is said to be equidistant from a set of objects if the distances between that point and each object in the set are equal. In two-dimensional Euclidean geometry, the locus of points equidistant from two given points is their perpendicular bisector.”
In two dimensions, this set of points corresponds to a line.
In multiple dimensions: A line in 2D corresponds to a hyperplane in a higher-dimensional space.
The boundaries are not explicitly computed; implied by the decision rule.
However, there are problems with the Rocchio classifier. The lecture slides state:
“The classification decision ignores the distribution of members locally within a class, only based on the centroid distance.”
Only based on the centroid distance, and not locally within a class.
This implicitly assumes that classes are spheres with similar radii (plural form of radius).
This may not be the case.
It does not work well for classes that cannot be accurately represented by a single prototype or centre (e.g. disconnected or elongated regions).
This can be illustrated by different models, all these models are from the lecture:
Problematic: Elongated regions.
Problematic: Non-contiguous regions.
Problematic: Different sizes.
Problematic: Nonlinear boundary.
A side-note on non-linearity.
“Before we turn to talk about non-linear classifiers, note that: Classes that are not linearly separable in a given feature space may become linearly separable when the features are mapped to a higher-dimensional space (this is the basis for so-called kernel methods).”
What other forms of classification do we have?
Let’s get to this in a different article.
Hope you enjoyed this one.
Again, if you enjoy this article let me know, and if you see anything lacking then please tell me.
I write to learn and enjoy every second, so feedback makes me happy.
This is #500daysofAI and you are reading article 447. I am writing one new article about or related to artificial intelligence every day for 500 days.