Qualitative Methods and Natural-Language Processing

Questions of parameters for good outcomes of measurements beyond output-input

How can qualitative methods inform, enrich or question natural-language processing? This article is a short reflection on the this question.

Natural-language processing (NLP) is a machine learning technique from computer science that uses algorithms to analyze textual data.

Often this is thought of in terms of quantifying the data and considering patterns.

This approach could help to analyse and understand language. However, it could be used to generate language or to monitor language models that have been deployed.

Working with large documents as an example one can often be left wondering where to start. It is of course possible to do unsupervised clustering of words or texts. This is not entirely ‘unsupervised’ of course.

Unsupervised learning is a type of machine learning that looks for previously undetected patterns in a data set with no pre-existing labels and with a minimum of human supervision.

For this to work someone has programmed an algorithm to undertake a set of actions. So unsupervised can be a slight misunderstanding unless you have some prior understanding about ML. Even mathematical decisions can be made through assumptions as to how labels are created or made.

This may be from confusion between the math definition of bias as opposed to that of notions of bias found in social science.

In math it could be:

A sample is "biased" if some members of the population are more likely to be included than others. A sample is "unbiased" if all members of the population are equally likely to be included

In social science this can be constructed differently.

Bias is disproportionate weight in favor of or against an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, or a belief.

So, in that sense, bias could on the one hand be a systemic error in math. It could on the other hand be ‘unfair’ and be relating to an idea or values.

As recently demonstrated (in GPT-3 as an example) even one of the most advanced NLP algorithms can pick up existing attitudes or statements repeated with potential adverse effects. There are many achievements of progress, yet the considerations may be different.

Error, in applied mathematics, the difference between a true value and an estimate, or approximation, of that value. In statistics, a common example is the difference between the mean of an entire population and the mean of a sample drawn from that population.

Much like this, as NLP, is based on statistics and maths the mean of the error can be what counts. If more correct decisions are made as opposed to decisions considered ‘bad’ the overall model works to some extent.

Then again, as mentioned, good or bad is often defined by the applied mathematician or statistician. It could equally be defined in parameters handed to these professionals by leaders with certain requirements. Bank loans, insurance, interaction, sales and so on could be environments with certain conditions.

Given possible adverse conditions a qualitative researcher may need to find possible fault lines beyond the vehicle or delivery measured. This is dependant on the relationship between the researcher and the organisation. It can equally be a question of parameters for good outcomes beyond that which is measured by the output-input interplay of the many interfaces within which algorithms operate. The developer, the management, customer, citizen, client and so on.

This is #500daysofAI and you are reading article 431. I am writing one new article about or related to artificial intelligence every day for 500 days.

AI Policy and Ethics at www.nora.ai. Student at University of Copenhagen MSc in Social Data Science. All views are my own. twitter.com/AlexMoltzau