Commit c0a6fc65 by Jaime Collado

INSTALL and README updated

parent 6e5a4700
# Installation instructions
In order to make use of this library, install it as follows:
1. Clone this repository: `git clone https://gitlab.ujaen.es/ammr0032/TextAnalysisSpacy.git`
2. Use this inside the project's folder: `python -m pip install .`
\ No newline at end of file
2. Use this inside the project's folder: `python -m pip install .`
This library has been tested in Python 3.8+
\ No newline at end of file
......@@ -7,13 +7,13 @@ This class provides methods for the calculation of different metrics on text. It
- [INSTALL.md](INSTALL.md): A guide to make this project work on your local environment.
## ./src/texty
- [analyzer.py](src/texty/analyzer.py): This class provides methods for the calculation of different metrics on text.
- [complexity.py](src/texty/complexity.py): This class provides methods for the calculation of different complexity metrics on text.
- [analyzer.py](src/texty/analyzer.py): This module provides a class with methods for the calculation of different metrics on text.
- [complexity.py](src/texty/complexity.py): This module provides a class methods for the calculation of different complexity metrics on text.
- [CREA_total.txt](CREA_total.txt): A dataset of 737799 spanish words ordered by its absolute frequency.
- [analyze_complexity.py](src/texty/analyze_complexity.py): A script that takes a .txt file and an output format as input and generates a file containing all metrics as calculated by the ComplexityAnalyzer class.
- [analyze_complexity.py](src/texty/analyze_complexity.py): Script that takes a .txt file and an output format as input and generates a file containing all metrics as calculated by the ComplexityAnalyzer class.
## ./examples
- [example_text.txt](examples/example_text.txt): A simple .txt file to test the library.
- [example_text.txt](examples/example_text.txt): Simple .txt file to test the library.
- [example.ipynb](examples/example.ipynb): Colab notebook that shows how to use the ComplexityAnalyzer class.
......@@ -34,3 +34,11 @@ In this section, we introduce the different metrics offered in this Python libra
* **Feature selection**: Remove features with low variance and SelectFromModel (Selection of functions based on L1)
* **kBest**: Selection of the k best features
# Usage
You can run _Texty_ from terminal as follows:
`analyze-complexity {text_file.txt} [-o output_format (csv, tsv or json)]`
......@@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 4,
"id": "5745bcf4",
"metadata": {},
"outputs": [],
......@@ -21,7 +21,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 5,
"id": "63c5bfcb",
"metadata": {},
"outputs": [
......@@ -31,7 +31,7 @@
"'Veo que en este foro, afortunadamente para vosotros, no hay mucha gente que sufra de TOC.Si hay alguien por ahí, me gustaría que compartiérais vuestras opiniones, yo compruebo las cosas que hago porque tengo miedo de haberme equivocado y pienso en las consecuencias que ese error podría acarrearme, y las compruebo una y otra vez, y esto me angustia.\\nSé que abrí un post parecido hace tiempo, pero ya quedó abajo y por tanto en el olvido, por eso abro este por si alguna persona nueva con este problema lo lee.Me gustaría saber qué os recetan a vosotros para esto y si os va bien.\\n\\nSaludos.\\nNereida.'"
]
},
"execution_count": 2,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
......@@ -58,7 +58,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 6,
"id": "52fd5e8e",
"metadata": {},
"outputs": [
......@@ -68,39 +68,37 @@
"text": [
"Collecting es-core-news-sm==3.2.0\n",
" Downloading https://github.com/explosion/spacy-models/releases/download/es_core_news_sm-3.2.0/es_core_news_sm-3.2.0-py3-none-any.whl (14.0 MB)\n",
" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 48.3 MB/s eta 0:00:00\n",
" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 45.0 MB/s eta 0:00:00\n",
"Requirement already satisfied: spacy<3.3.0,>=3.2.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from es-core-news-sm==3.2.0) (3.2.3)\n",
"Requirement already satisfied: pathy>=0.3.5 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (0.6.1)\n",
"Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (1.0.6)\n",
"Requirement already satisfied: wasabi<1.1.0,>=0.8.1 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (0.9.0)\n",
"Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (2.0.6)\n",
"Requirement already satisfied: typer<0.5.0,>=0.3.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (0.4.0)\n",
"Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (3.0.6)\n",
"Requirement already satisfied: requests<3.0.0,>=2.13.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (2.27.1)\n",
"Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (2.0.7)\n",
"Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.8 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (3.0.9)\n",
"Requirement already satisfied: typer<0.5.0,>=0.3.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (0.4.0)\n",
"Requirement already satisfied: setuptools in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (61.0.0)\n",
"Requirement already satisfied: packaging>=20.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (21.3)\n",
"Requirement already satisfied: jinja2 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (3.0.3)\n",
"Requirement already satisfied: blis<0.8.0,>=0.4.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (0.7.7)\n",
"Requirement already satisfied: pathy>=0.3.5 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (0.6.1)\n",
"Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (1.0.6)\n",
"Requirement already satisfied: srsly<3.0.0,>=2.4.1 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (2.4.2)\n",
"Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (4.63.1)\n",
"Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (1.0.1)\n",
"Requirement already satisfied: wasabi<1.1.0,>=0.8.1 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (0.9.0)\n",
"Requirement already satisfied: thinc<8.1.0,>=8.0.12 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (8.0.15)\n",
"Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (1.8.2)\n",
"Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (2.0.6)\n",
"Requirement already satisfied: setuptools in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (61.0.0)\n",
"Requirement already satisfied: numpy>=1.15.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (1.22.3)\n",
"Requirement already satisfied: jinja2 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (3.1.1)\n",
"Requirement already satisfied: requests<3.0.0,>=2.13.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (2.27.1)\n",
"Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.8 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (3.0.9)\n",
"Requirement already satisfied: thinc<8.1.0,>=8.0.12 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (8.0.15)\n",
"Requirement already satisfied: packaging>=20.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (21.3)\n",
"Requirement already satisfied: blis<0.8.0,>=0.4.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (0.7.7)\n",
"Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (3.3.0)\n",
"Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (1.8.2)\n",
"Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from packaging>=20.0->spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (3.0.7)\n",
"Requirement already satisfied: smart-open<6.0.0,>=5.0.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from pathy>=0.3.5->spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (5.2.1)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4->spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (4.1.1)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (1.26.9)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (2021.10.8)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (2.0.12)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (2021.10.8)\n",
"Requirement already satisfied: idna<4,>=2.5 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (3.3)\n",
"Requirement already satisfied: click<9.0.0,>=7.1.1 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from typer<0.5.0,>=0.3.0->spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (8.0.4)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in /home/jcollado/miniconda3/envs/text-analysis/lib/python3.9/site-packages (from jinja2->spacy<3.3.0,>=3.2.0->es-core-news-sm==3.2.0) (2.1.1)\n",
"Installing collected packages: es-core-news-sm\n",
"Successfully installed es-core-news-sm-3.2.0\n",
"\u001b[38;5;2m✔ Download and installation successful\u001b[0m\n",
"You can now load the package via spacy.load('es_core_news_sm')\n"
]
......@@ -134,17 +132,14 @@
" 'crawford': 4.851558558558558}"
]
},
"execution_count": 3,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"try:\n",
" nlp = spacy.load(\"es_core_news_sm\")\n",
"except:\n",
" spacy.cli.download(\"es_core_news_sm\")\n",
" nlp = spacy.load(\"es_core_news_sm\")\n",
"spacy.cli.download(\"es_core_news_sm\")\n",
"nlp = spacy.load(\"es_core_news_sm\")\n",
"\n",
"ca = ComplexityAnalyzer(\"es\", nlp)\n",
"\n",
......
......@@ -25,11 +25,8 @@ def analyze_complexity(args=None):
exit()
# Instantiate the ComplexityAnalyzer class
try:
nlp = spacy.load("es_core_news_sm")
except:
spacy.cli.download("es_core_news_sm")
nlp = spacy.load("es_core_news_sm")
spacy.cli.download("es_core_news_sm")
nlp = spacy.load("es_core_news_sm")
complexity_analyzer = ComplexityAnalyzer("es", nlp)
# Read file
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment