Commit 6e5a4700 by Jaime Collado

Updated INSTALL.md and README.md, removed unused files

parent 3f8e23af
# Installation instructions
## Requirements
- python:
- pip install numpy
- pip install spacy
- pip install pandas
- pip install lexical-diversity
- pip install syllables
- pip install transformers
## Installation procedure
1. Install all the previous requirements.
2. Download or clone this repository.
1. Clone this repository: `git clone https://gitlab.ujaen.es/ammr0032/TextAnalysisSpacy.git`
2. Use this inside the project's folder: `python -m pip install .`
\ No newline at end of file
......@@ -5,26 +5,32 @@ This class provides methods for the calculation of different metrics on text. It
# Files
- [INSTALL.md](INSTALL.md): A guide to make this project work on your local environment.
- [TextAnalysisSpacy.py](TextAnalysisSpacy.py): This class provides methods for the calculation of different metrics on text.
- [TextComplexitySpacy.py](TextComplexitySpacy.py): This class provides methods for the calculation of different complexity metrics on text.
## ./src/texty
- [analyzer.py](src/texty/analyzer.py): This class provides methods for the calculation of different metrics on text.
- [complexity.py](src/texty/complexity.py): This class provides methods for the calculation of different complexity metrics on text.
- [CREA_total.txt](CREA_total.txt): A dataset of 737799 spanish words ordered by its absolute frequency.
- [example.ipynb](example.ipynb): Colab notebook that shows how to use the TextAnalisysSpacy class in several texts.
- [analyze_complexity.py](src/texty/analyze_complexity.py): A script that takes a .txt file and an output format as input and generates a file containing all metrics as calculated by the ComplexityAnalyzer class.
## ./examples
- [example_text.txt](examples/example_text.txt): A simple .txt file to test the library.
- [example.ipynb](examples/example.ipynb): Colab notebook that shows how to use the ComplexityAnalyzer class.
# Metrics
In this section, we introduce the different metrics offered in this Python library for different languages (Spanish, English).
* **Volumetry**: here it calculates the number of words, number of unique words, number of characters and average word length for text. Then it is calculated volumetrics for each category.
* **Volumetry**: Here it calculates the number of words, number of unique words, number of characters and average word length for text. Then it is calculated volumetrics for each category.
* **Lemmas**: Number and length of different lemmas per text. Average and variance of different lemmas and length by category. Most frequent lemmas by category.
* **Part-of-speech(POS)**:POS analysis for each text. POS analysis for each category. Most frequent words by POS.
* **Part-of-speech (POS)**: POS analysis for each text. POS analysis for each category. Most frequent words by POS.
* **Lexical_diversity**: Lexical diversity for each text (simple_TTR, root_TTR, log_TTR, maas_TTR, MSTTR, MATTR, HDD, MTLD). Lexical diversity for each category.
* **Lexical diversity**: Lexical diversity for each text (simple_TTR, root_TTR, log_TTR, maas_TTR, MSTTR, MATTR, HDD, MTLD). Lexical diversity for each category.
* **Complexity**:Complexity diversity for each category. Complexity diversity for each category.
* **Complexity**: Complexity diversity for each category. Complexity diversity for each category.
* **FeatureSelection**: Remove features with low variance and SelectFromModel (Selection of functions based on L1)
* **Feature selection**: Remove features with low variance and SelectFromModel (Selection of functions based on L1)
* **kBest**: Selection of the k best features
This diff could not be displayed because it is too large.
blis==0.7.7; python_version >= "3.6"
catalogue==2.0.7; python_version >= "3.6"
certifi==2021.10.8; python_full_version >= "3.6.0" and python_version >= "3.6"
charset-normalizer==2.0.12; python_full_version >= "3.6.0" and python_version >= "3.6"
click==8.0.4; python_version >= "3.7" and python_full_version >= "3.6.0"
colorama==0.4.4; python_full_version >= "3.6.0" and platform_system == "Windows" and python_version >= "3.7" and (python_version >= "3.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.7")
cycler==0.11.0; python_version >= "3.7"
cymem==2.0.6; python_version >= "3.6"
filelock==3.6.0; python_version >= "3.7" and python_full_version >= "3.6.0"
fonttools==4.31.2; python_version >= "3.7"
huggingface-hub==0.4.0; python_full_version >= "3.6.0"
idna==3.3; python_full_version >= "3.6.0" and python_version >= "3.6"
jinja2==3.1.1; python_version >= "3.7"
joblib==1.1.0; python_version >= "3.7" and python_full_version >= "3.6.0"
kiwisolver==1.4.1; python_version >= "3.7"
langcodes==3.3.0; python_version >= "3.6"
lexical-diversity==0.1.1
markupsafe==2.1.1; python_version >= "3.7"
matplotlib==3.5.1; python_version >= "3.7"
murmurhash==1.0.6; python_version >= "3.6"
nltk==3.7; python_version >= "3.7"
numpy==1.22.3; python_version >= "3.8"
packaging==21.3; python_version >= "3.7" and python_full_version >= "3.6.0"
pandas==1.4.1; python_version >= "3.8"
pathy==0.6.1; python_version >= "3.6"
pillow==9.0.1; python_version >= "3.7"
preshed==3.0.6; python_version >= "3.6"
pydantic==1.8.2; python_full_version >= "3.6.1" and python_version >= "3.6"
pyparsing==3.0.7; python_version >= "3.7" and python_full_version >= "3.6.0"
python-dateutil==2.8.2; python_version >= "3.8" and python_full_version < "3.0.0" or python_full_version >= "3.3.0" and python_version >= "3.8"
pytz==2022.1; python_version >= "3.8"
pyyaml==6.0; python_version >= "3.6" and python_full_version >= "3.6.0"
regex==2022.3.15; python_version >= "3.7" and python_full_version >= "3.6.0"
requests==2.27.1; python_full_version >= "3.6.0" and python_version >= "3.6"
sacremoses==0.0.49; python_full_version >= "3.6.0"
scipy==1.6.1; python_version >= "3.7"
seaborn==0.11.2; python_version >= "3.6"
setuptools-scm==6.4.2; python_version >= "3.7"
six==1.16.0; python_full_version >= "3.6.0" and python_version >= "3.8"
smart-open==5.2.1; python_version >= "3.6" and python_version < "4.0"
spacy-legacy==3.0.9; python_version >= "3.6"
spacy-loggers==1.0.1; python_version >= "3.6"
spacy==3.2.3; python_version >= "3.6"
srsly==2.4.2; python_version >= "3.6"
syllables==1.0.3; python_version >= "2.7"
thinc==8.0.15; python_version >= "3.6"
tokenizers==0.11.6; python_full_version >= "3.6.0"
tomli==2.0.1; python_version >= "3.7"
tqdm==4.63.1; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.4.0")
transformers==4.17.0; python_full_version >= "3.6.0"
typer==0.4.0; python_version >= "3.6"
typing-extensions==4.1.1; python_full_version >= "3.6.1" and python_version >= "3.6"
urllib3==1.26.9; python_full_version >= "3.6.0" and python_version < "4" and python_version >= "3.6"
wasabi==0.9.0; python_version >= "3.6"
[metadata]
name = texty
version = 0.0.1
author = Alba María Mármol
author_email = ammarmol@ujaen.es
description = Text analysis and processing package
long_description = file: README.md
long_description_content_type = text/markdown
url = https://gitlab.ujaen.es/ammr0032/TextAnalysisSpacy
project_urls =
Bug Tracker = https://gitlab.ujaen.es/ammr0032/TextAnalysisSpacy/issues
classifiers =
Programming Language :: Python :: 3
License :: OSI Approved :: MIT License
Operating System :: OS Independent
[options]
package_dir =
= src
packages = find:
python_requires = >=3.6
[options.entry_points]
console_scripts =
analyze-complexity = texty.analyze_complexity:analyze_complexity
[options.packages.find]
where = src
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment