Beyond Buzzwords: Natural Language Processing

Naira Musallam, PhDMar 12 2019
Beyond Buzzwords: Natural Language Processing

Out of the Weeds, Part II

In the first article of our Beyond Buzzwords series, we set out to demystify the meaning of machine learning, showcasing its relevance and practicality in the consumer insights space.
This second installment is all about doing the same with natural language processing (NLP), also known as text analytics.

Natural language processing is widely acknowledged as a subfield of artificial intelligence. Its focus is on enabling computers to process and understand human languages. This allows NLP to perform functions like translations, semantic analysis, text classification, extraction, and summarization.

In practice, NLP relies on multiple disciplines, including computer science, computational power, statistics, and linguistics to understand human communications.

When we talk about NLP, we are usually referring to:

  • Content & topic categorization: the ability to organize a piece of text into meaningful themes or categories. It could be behaviors, products, or any organizing factor of importance to the end-user.
  • Speech-to-text and text-to-speech: converting audio into written text and vice versa.
  • Document summarization: the ability to extract and create accurate textual summaries based on a large quantity of text.
  • Named entity recognition (e.g. names, organizations, etc.) & part of speech tagging (or grammatical tagging)
  • Sentiment analysis: identifying the emotional reactions to include types of emotions, frequency, and intensity

You might be asking yourself, this sounds great, but how does it work? How could a machine understand human language? What type of processes and algorithms are applied?

You might be surprised to know that accurate and actionable NLP outcomes often take hard work. Specifically, the work of computer scientists, engineers, linguists, and industry-specific experts, who do a significant amount of manual (and not-so-sexy) work to get the software to perform “artificially intelligent” tasks.

So, let’s say you have a million product reviews you want to analyze or 1,000 pages of text from consumer interviews, and you would like to extract the sentiments and/or understand the most popular topics

If the NLP software you are using is any good, the first step would be to clean the data. Just like organizing a messy excel sheet, NLP software combs through your results to clean the data- or at least reduce the “level of noise” to a minimum.

This critical first step is called pre-processing and it involves both “normalization” and “tokenization”.

Normalization involves tasks like removing non-alphabetical characters, converting letters to lowercase, removing stop words (e.g. the, a, in, etc.), converting numbers to words, and stemming and lemmatization.

For further context, stemming and lemmatization work to reduce words to a common variant- the “stem” or “lemma”. The “stem” is the part of the word to which you add influential affixes such as -ed, -ize, mis, etc. Sometimes this results in words that are not actual words. The “lemma” is the base or dictionary form of the word.


Tokenization refers to segmenting the text into smaller chunks. This means paragraphs can be tokenized into sentences, and sentences into categories, sentiments, parts of speech, or parsing and then tagged the text with anything meaningful to the user (e.g. name recognition, sentiment, behaviors, etc.).

While there are readily available libraries with codes and algorithms that can perform the above tasks- if you are building your own lexical analysis and framework, you should tokenize your own text.

You might want to do this either because your framework is new, or you want to enhance the accuracy. Or, you could work with a software platform that has custom data sets relevant to your space, already built-in.

Tokenized text becomes a “golden dataset”, which is then used to “train” a statistical model, applied to any new text. This is where you may come across the term “supervised machine learning”.

Depending on what you are trying to achieve, there are a variety of statistical models that can be applied. These range from logistic regression models, to Support Vector Machine (SVM), or deep neural learning.

The type of statistical model you choose depends on the structure and complexity of your data and frankly is the result of continuous experimentation to increase the accuracy.  

Hopefully, now, you feel a little better prepared to know what is available for your research. 

But more importantly, be able to evaluate future solutions with a clearer understanding of the science and the technology supporting it.

Naira Musallam, PhD

Naira Musallam, PhD

Ready to meet the next generation
of market research technology?

More from SightX

Key Methods for Optimal Market Segments [Webinar On-Demand]

Have you considered that your customers are more diverse than your messaging, branding, or outreach strategies? 

bySavannah Trotter

Five Ways to Get Creative with Heatmaps

We’ve said it before- and we'll most likely say it again: consumers are changing.

It should come as no surprise that consumer behavior has evolved quite a bit in recent years, but that evolution was fast-tracked in 2020. From where they shop to how they want to connect with their favorite brands- consumers demand engagement on their terms.

Effective engagement can mean speed and efficiency, but more often than not, it also demands creativity.

For insights teams, in particular, this can be a challenge. However, a modern, effective, and creative way to get impactful feedback from consumers is through a heatmap experiment.

A heatmap is a visual storytelling exercise. It organizes data about an image using color-coded zones representing the frequency of activities, interactions, or sentiments.

Historically, heatmaps have been a popular visualization tool with data-driven researchers across industries. Given current consumer trends, it shouldn’t come as a surprise that heatmaps have been gaining popularity in recent years amongst leading researchers. While they remain a key tool in user interface and experience research, their usage in concept and product testing research continues to gain popularity.

To help spark some creativity and curiosity, we’ve put together a list of simple ways you can incorporate heatmap techniques in your own research:

Whitespace & Prototype Testing
Exploring white space and researching prototypes are important initial steps in the product innovation process. If you have some initial ideas or mock-ups for a product, heatmaps can be an important early indicator about which attributes your potential customers would be compelled by, or (just as importantly) be repelled by.

Efficient and effective prototype feedback allows you to refine your products earlier in the development process- before you even begin building your minimum viable product (MVP).

Design Testing
Getting feedback on visual design elements like fonts, colors, layouts, and imagery is an important step in the research process, and heatmap experiments are one of the most cost- and time-efficient ways to do it.

Using heatmaps for design testing allows you to identify what works and what doesn’t for any customer-facing visuals.


Package Testing
Most products go through many iterations of packaging designs before launch. Testing various concepts with heat mapping allows you to gain detailed insights into potential customers' preferences surrounding specific packaging attributes.

Respondents have the opportunity to select and react to design elements, logo placements, packaging types, and other details - allowing you to understand where consumers focus their attention and in what order.

Ad & Message Testing
Your go-to-market messaging and content strategy can make or break your product launch. However, message testing isn’t just about the words themselves - the taglines, logos, and other copy in the ad are just as important as the package and product designs.

Using heatmaps, you can test which ad or message garners the most positive or frequent interaction, and which drives more viewers to engage with the Call-to-Action. Consumers indicate to researchers where the messaging is catching their attention, if that attention is positive or negative, and why they feel that way.

Shelf Placement
Even though most of us are primarily shopping online, the in-store experience cannot be overlooked. Pandemics aside, consumers will continue walking into stores for the foreseeable future. By testing how a consumer responds to different shopping environments, you can understand how to maximize value both for the customer and your brand during in-store shopping experiences.

Of course, the shelf is a critical point in the in-store customer journey. Heatmaps are a great way to understand optimal shelf placement and product combinations that will entice consumers to reach for your products. They can also help with the design of the shelf itself!

These are just five primary examples of how heatmaps can enhance your consumer research to provide visual, data-driven insights. They are a quick, fun way for consumers to provide insights in a survey setting, and make a great addition to any research report.

Start exploring new use cases and research projects with heatmaps! And of course, reach out to the team at SightX to learn more.

byNaira Musallam, PhD
Information v. Intelligence

Information vs. Intelligence

If you’ve ever worked in the military, cybersecurity, or even corporate compliance- you have most likely heard a debate about the differences between information and intelligence. 

byTim Lawton
SightX

Research Services

Business