Update README.md

parent bd010e0f
Showing with 34 additions and 0 deletions
......@@ -9,3 +9,37 @@ This class provides methods for the calculation of different complexity metrics
- [TextComplexitySpacy.py](TextComplexitySpacy.py): This class provides methods for the calculation of different complexity metrics on text.
# Complexity metrics
In this section, we introduce the different complexity metrics offered in this Python library, proposed by different authors, for different languages (Spanish, English, French...).
*NOTE: for details on citations, please, refer to the papers above*
* **Lexical complexity**: The lexical complexity of a text, determined by the frequency of use and lexical density, was proposed by Anula (2018). It is based on the number of different content words per sentence (\emph{Lexical Complexity Index, LC}) and on measuring the number of low frequency words per 100 content words (*Index of Low Frequency Words, ILFW*). Consequently, the higher the LC index, the greater the difficulty in reading comprehension.
* **Spaulding readability**: Commonly known as the SSR Index, it was proposed by Spaulding in 1956. It focuses on measuring vocabulary and sentence structure to predict the relative difficulty of a text's readability.
Its formula is an empirically adjusted measure to try to keep the score between 0 and 1.
* **Complexity of sentence**s: The Sentence Complexity Index (SCI) was proposed by Anula in 2018, as a measure of the complexity of sentences in a literary text aimed at second language learners.
This syntactic complexity measure focuses on measuring the number of words per sentence, thus obtaining the sentence length index (*Average Sentence Length, ASL*), and the number of complex sentences per sentence, from a complex sentence index (*Complex Sentences, CS*).
* **Automated Readability Index (ARI)**: Senter and Smith, in 1967, proposed one of the most used indexes due to its ease of calculation, the *Automated Readability Index*, better known as ARI. This index measures the difficulty of a text from the average number of characters (letters and numbers) per word and the average number of words per sentence.
* **Dependency tree depth**: This measure was proposed by Saggion et al. in 2015. It is a very useful metric to capture syntactic complexity: long sentences can be syntactically complex or contain a large number of modifiers (adjectives, adverbs or adverbial phrases). It complements the ASL measure, as it captures syntactic complexity in terms of recursive or nested structures.
* **Punctuation Marks**: This measure was also proposed by Saggion et al. In the complexity of a text, the average number of punctuation marks is used as one of the indicators of the simplicity of the text.
* **Readability of Fernández-Huerta**: Blanco (2002) and Ramirez (2013) propose this measure of complexity as an adaptation to Spanish of Flesch's readability test (Flesch, 1948).
* **Readability of Flesch-Szigrist (IFSZ)**: The works of Barrio et al. (2008) and Ramírez et al. (2013) propose the Flesch-Szigristzt readability index as a modification of the Flesch formula adapted to Spanish by Szigriszt-Pazos in 1993. This index is currently considered a reference for the Spanish language. It focuses on measuring the number of syllables per word and the number of sentences per word in the text.
* **Comprehensibility of Gutiérrez de Polini**: This metric, originally developed in 1972, is not an adaptation of English, but was created from the beginning for Spanish (Rodriguez, 1980). It focuses on measuring the average number of letters per word and the average number of words per sentence.
* **mu Readability**: It is a formula to calculate the readability of a text. It provides an index between 0 and 100 and was developed by Muñoz in 2006. This measure focuses on measuring the number of words, the average number of letters per word and their variance.
* **Minimum age to understand**: In work of García (2001) we can find another formula to measure the age needed to understand a text. It is, again, an adaptation into Spanish of Flesch's original formula for English. It measures the average number of syllables per word and the average number of words per sentence to obtain the minimum age needed to understand a text.
* **SOL Readability**: Contreras et al. (1999) proposes the SOL metric as an adaptation to Spanish of the SMOG formula proposed by Mc Laughlin (1969). It measures the readability of a text by means of grade level, which is the number of years of schooling required to understand the text.
* **Years Crawford**: This measure was proposed by Alan N. Crawford in 1989. It is used to calculate the years of school required to understand a text. Measures the number of sentences per hundred words and the number of syllables per hundred words.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment