Commit 6e5a4700 by Jaime Collado

Updated INSTALL.md and README.md, removed unused files

parent 3f8e23af
# Installation instructions # Installation instructions
## Requirements 1. Clone this repository: `git clone https://gitlab.ujaen.es/ammr0032/TextAnalysisSpacy.git`
2. Use this inside the project's folder: `python -m pip install .`
- python: \ No newline at end of file
- pip install numpy
- pip install spacy
- pip install pandas
- pip install lexical-diversity
- pip install syllables
- pip install transformers
## Installation procedure
1. Install all the previous requirements.
2. Download or clone this repository.
...@@ -5,26 +5,32 @@ This class provides methods for the calculation of different metrics on text. It ...@@ -5,26 +5,32 @@ This class provides methods for the calculation of different metrics on text. It
# Files # Files
- [INSTALL.md](INSTALL.md): A guide to make this project work on your local environment. - [INSTALL.md](INSTALL.md): A guide to make this project work on your local environment.
- [TextAnalysisSpacy.py](TextAnalysisSpacy.py): This class provides methods for the calculation of different metrics on text.
- [TextComplexitySpacy.py](TextComplexitySpacy.py): This class provides methods for the calculation of different complexity metrics on text. ## ./src/texty
- [analyzer.py](src/texty/analyzer.py): This class provides methods for the calculation of different metrics on text.
- [complexity.py](src/texty/complexity.py): This class provides methods for the calculation of different complexity metrics on text.
- [CREA_total.txt](CREA_total.txt): A dataset of 737799 spanish words ordered by its absolute frequency. - [CREA_total.txt](CREA_total.txt): A dataset of 737799 spanish words ordered by its absolute frequency.
- [example.ipynb](example.ipynb): Colab notebook that shows how to use the TextAnalisysSpacy class in several texts. - [analyze_complexity.py](src/texty/analyze_complexity.py): A script that takes a .txt file and an output format as input and generates a file containing all metrics as calculated by the ComplexityAnalyzer class.
## ./examples
- [example_text.txt](examples/example_text.txt): A simple .txt file to test the library.
- [example.ipynb](examples/example.ipynb): Colab notebook that shows how to use the ComplexityAnalyzer class.
# Metrics # Metrics
In this section, we introduce the different metrics offered in this Python library for different languages (Spanish, English). In this section, we introduce the different metrics offered in this Python library for different languages (Spanish, English).
* **Volumetry**: here it calculates the number of words, number of unique words, number of characters and average word length for text. Then it is calculated volumetrics for each category. * **Volumetry**: Here it calculates the number of words, number of unique words, number of characters and average word length for text. Then it is calculated volumetrics for each category.
* **Lemmas**: Number and length of different lemmas per text. Average and variance of different lemmas and length by category. Most frequent lemmas by category. * **Lemmas**: Number and length of different lemmas per text. Average and variance of different lemmas and length by category. Most frequent lemmas by category.
* **Part-of-speech(POS)**:POS analysis for each text. POS analysis for each category. Most frequent words by POS. * **Part-of-speech (POS)**: POS analysis for each text. POS analysis for each category. Most frequent words by POS.
* **Lexical_diversity**: Lexical diversity for each text (simple_TTR, root_TTR, log_TTR, maas_TTR, MSTTR, MATTR, HDD, MTLD). Lexical diversity for each category. * **Lexical diversity**: Lexical diversity for each text (simple_TTR, root_TTR, log_TTR, maas_TTR, MSTTR, MATTR, HDD, MTLD). Lexical diversity for each category.
* **Complexity**:Complexity diversity for each category. Complexity diversity for each category. * **Complexity**: Complexity diversity for each category. Complexity diversity for each category.
* **FeatureSelection**: Remove features with low variance and SelectFromModel (Selection of functions based on L1) * **Feature selection**: Remove features with low variance and SelectFromModel (Selection of functions based on L1)
* **kBest**: Selection of the k best features * **kBest**: Selection of the k best features
This diff could not be displayed because it is too large.
blis==0.7.7; python_version >= "3.6"
catalogue==2.0.7; python_version >= "3.6"
certifi==2021.10.8; python_full_version >= "3.6.0" and python_version >= "3.6"
charset-normalizer==2.0.12; python_full_version >= "3.6.0" and python_version >= "3.6"
click==8.0.4; python_version >= "3.7" and python_full_version >= "3.6.0"
colorama==0.4.4; python_full_version >= "3.6.0" and platform_system == "Windows" and python_version >= "3.7" and (python_version >= "3.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.7")
cycler==0.11.0; python_version >= "3.7"
cymem==2.0.6; python_version >= "3.6"
filelock==3.6.0; python_version >= "3.7" and python_full_version >= "3.6.0"
fonttools==4.31.2; python_version >= "3.7"
huggingface-hub==0.4.0; python_full_version >= "3.6.0"
idna==3.3; python_full_version >= "3.6.0" and python_version >= "3.6"
jinja2==3.1.1; python_version >= "3.7"
joblib==1.1.0; python_version >= "3.7" and python_full_version >= "3.6.0"
kiwisolver==1.4.1; python_version >= "3.7"
langcodes==3.3.0; python_version >= "3.6"
lexical-diversity==0.1.1
markupsafe==2.1.1; python_version >= "3.7"
matplotlib==3.5.1; python_version >= "3.7"
murmurhash==1.0.6; python_version >= "3.6"
nltk==3.7; python_version >= "3.7"
numpy==1.22.3; python_version >= "3.8"
packaging==21.3; python_version >= "3.7" and python_full_version >= "3.6.0"
pandas==1.4.1; python_version >= "3.8"
pathy==0.6.1; python_version >= "3.6"
pillow==9.0.1; python_version >= "3.7"
preshed==3.0.6; python_version >= "3.6"
pydantic==1.8.2; python_full_version >= "3.6.1" and python_version >= "3.6"
pyparsing==3.0.7; python_version >= "3.7" and python_full_version >= "3.6.0"
python-dateutil==2.8.2; python_version >= "3.8" and python_full_version < "3.0.0" or python_full_version >= "3.3.0" and python_version >= "3.8"
pytz==2022.1; python_version >= "3.8"
pyyaml==6.0; python_version >= "3.6" and python_full_version >= "3.6.0"
regex==2022.3.15; python_version >= "3.7" and python_full_version >= "3.6.0"
requests==2.27.1; python_full_version >= "3.6.0" and python_version >= "3.6"
sacremoses==0.0.49; python_full_version >= "3.6.0"
scipy==1.6.1; python_version >= "3.7"
seaborn==0.11.2; python_version >= "3.6"
setuptools-scm==6.4.2; python_version >= "3.7"
six==1.16.0; python_full_version >= "3.6.0" and python_version >= "3.8"
smart-open==5.2.1; python_version >= "3.6" and python_version < "4.0"
spacy-legacy==3.0.9; python_version >= "3.6"
spacy-loggers==1.0.1; python_version >= "3.6"
spacy==3.2.3; python_version >= "3.6"
srsly==2.4.2; python_version >= "3.6"
syllables==1.0.3; python_version >= "2.7"
thinc==8.0.15; python_version >= "3.6"
tokenizers==0.11.6; python_full_version >= "3.6.0"
tomli==2.0.1; python_version >= "3.7"
tqdm==4.63.1; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.4.0")
transformers==4.17.0; python_full_version >= "3.6.0"
typer==0.4.0; python_version >= "3.6"
typing-extensions==4.1.1; python_full_version >= "3.6.1" and python_version >= "3.6"
urllib3==1.26.9; python_full_version >= "3.6.0" and python_version < "4" and python_version >= "3.6"
wasabi==0.9.0; python_version >= "3.6"
[metadata]
name = texty
version = 0.0.1
author = Alba María Mármol
author_email = ammarmol@ujaen.es
description = Text analysis and processing package
long_description = file: README.md
long_description_content_type = text/markdown
url = https://gitlab.ujaen.es/ammr0032/TextAnalysisSpacy
project_urls =
Bug Tracker = https://gitlab.ujaen.es/ammr0032/TextAnalysisSpacy/issues
classifiers =
Programming Language :: Python :: 3
License :: OSI Approved :: MIT License
Operating System :: OS Independent
[options]
package_dir =
= src
packages = find:
python_requires = >=3.6
[options.entry_points]
console_scripts =
analyze-complexity = texty.analyze_complexity:analyze_complexity
[options.packages.find]
where = src
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment