Norwegian Orthographic Analyzer

The Norwegian Orthographic Analyzer (NOA) is an online tool designed to search a database of Norwegian words and analyze their orthographic, grammatical, and statistical properties.

Link to the online tool

https://noa.spell.uiocloud.no/

Background

Developed by SciFy (https://www.scify.gr/site/en/). Funded by ISP internal research funding to Linda Larsen, Vasiliki Diamanti, and Athanassios Protopapas. Manual annotation and quality assurance of transcriptions, Linda Larsen. Specifications and supervision, Athanassios Protopapas. The tool is now maintained and hosted by Oslo SPeLL.

Word database/corpus

The corpus is compiled from a large collection of Norwegian subtitles for film and TV, and was used to calculate the word frequencies and properties. This has the advantage of making the data more relevant for spoken language. The Oslo-Bergen tagger has been used for grammatical class classification of the subtitles corpus.

How to use

Register a user account

Analyze single word

To get statistics and data for a single word, simply navigate to https://noa.spell.uiocloud.no/, type in your word, and click "Analyze". Hover your mouse over the table headings to display their meaning (e.g. NOV=Number of vowels)

Upload a list of words to analyze

To get statistics and data for a list of words, navigate to https://noa.spell.uiocloud.no/, click "Analyze from file", "Choose File", and finally "Analyze file".

Browse corpus

Click "Corpus". This table displays all the words included in the corpus. Requires a user account.

Browse available search rules

Click "Rules". This table displays various rules that can be used when searching for specific words.

Multi-criteria search

Click "Search". Allows searching for words in the corpus by specific criteria. For example, one can create a search for verbs, with less than 7 letters, a frequency larger than 5, that contain an "æ" followed by "r". Requires a user account.

Available criteria:

POS (Part of speech): Filter by part of speech, such as adjectives, adverbs, nouns, verbs, etc.
CValues: Filter by various statistical properties, such as number of letters, frequency, etc.
PRules: Filter by available pre-defined phonological rules, such as "b_followedBy_b", "c_laastLetter", etc.
Regex: Filter by regular expressions, which allow you to search for patterns in text that match a particular syntax. For example, you could use a regular expression to find all instances of words that start with the letter "a" and end with the letter "t", or all instances of words that contain a particular sequence of letters.

Publisert 1. juni 2023 10:14 - Sist endret 1. juni 2023 10:14