Updated INSTALL.md and README.md, removed unused files

6e5a4700 · Jaime Collado · 3f8e23af · 6e5a4700 · 6e5a4700 · 3f8e23af
Commit 6e5a4700 authored Mar 28, 2022 by Jaime Collado
Showing with 71 additions and 52 deletions
INSTALL.md
README.md
__pycache__/TextAnalysisSpacy.cpython-36.pyc
examples/example.ipynb
examples/texts.csv
examples/texty_example.ipynb
requirements.txt
setup.cfg
--- a/INSTALL.md
+++ b/INSTALL.md
 # Installation instructions

-## Requirements
-
- python:
-  - pip install numpy
-  - pip install spacy
-  - pip install pandas
-  - pip install lexical-diversity
-  - pip install syllables
-  - pip install transformers
-
-## Installation procedure
-
-1. Install all the previous requirements.
-2. Download or clone this repository.
-
+1. Clone this repository: `git clone https://gitlab.ujaen.es/ammr0032/TextAnalysisSpacy.git`
+2. Use this inside the project's folder: `python -m pip install .`
\ No newline at end of file
--- a/README.md
+++ b/README.md
@@ -5,26 +5,32 @@ This class provides methods for the calculation of different metrics on text. It
 # Files

 - [INSTALL.md](INSTALL.md): A guide to make this project work on your local environment.
- [TextAnalysisSpacy.py](TextAnalysisSpacy.py): This class provides methods for the calculation of different metrics on text.
- [TextComplexitySpacy.py](TextComplexitySpacy.py): This class provides methods for the calculation of different complexity metrics on text.
+
+## ./src/texty
+- [analyzer.py](src/texty/analyzer.py): This class provides methods for the calculation of different metrics on text.
+- [complexity.py](src/texty/complexity.py): This class provides methods for the calculation of different complexity metrics on text.
 - [CREA_total.txt](CREA_total.txt): A dataset of 737799 spanish words ordered by its absolute frequency.
- [example.ipynb](example.ipynb): Colab notebook that shows how to use the TextAnalisysSpacy class in several texts.
+- [analyze_complexity.py](src/texty/analyze_complexity.py): A script that takes a .txt file and an output format as input and generates a file containing all metrics as calculated by the ComplexityAnalyzer class.
+
+## ./examples
+- [example_text.txt](examples/example_text.txt): A simple .txt file to test the library.
+- [example.ipynb](examples/example.ipynb): Colab notebook that shows how to use the ComplexityAnalyzer class.


 # Metrics

 In this section, we introduce the different metrics offered in this Python library for different languages (Spanish, English). 

-* **Volumetry**: here it calculates the number of words, number of unique words, number of characters and average word length for text. Then it is calculated volumetrics for each category.
+* **Volumetry**: Here it calculates the number of words, number of unique words, number of characters and average word length for text. Then it is calculated volumetrics for each category.

 * **Lemmas**: Number and length of different lemmas per text. Average and variance of different lemmas and length by category. Most frequent lemmas by category.

-* **Part-of-speech(POS)**:POS analysis for each text. POS analysis for each category. Most frequent words by POS.
+* **Part-of-speech (POS)**: POS analysis for each text. POS analysis for each category. Most frequent words by POS.

-* **Lexical_diversity**: Lexical diversity for each text (simple_TTR, root_TTR, log_TTR, maas_TTR, MSTTR, MATTR, HDD, MTLD). Lexical diversity for each category.
+* **Lexical diversity**: Lexical diversity for each text (simple_TTR, root_TTR, log_TTR, maas_TTR, MSTTR, MATTR, HDD, MTLD). Lexical diversity for each category.

-* **Complexity**:Complexity diversity for each category. Complexity diversity for each category.
+* **Complexity**: Complexity diversity for each category. Complexity diversity for each category.

-* **FeatureSelection**: Remove features with low variance and SelectFromModel (Selection of functions based on L1)
+* **Feature selection**: Remove features with low variance and SelectFromModel (Selection of functions based on L1)

 * **kBest**: Selection of the k best features
--- a/__pycache__/TextAnalysisSpacy.cpython-36.pyc
+++ b/__pycache__/TextAnalysisSpacy.cpython-36.pyc
--- a/examples/example.ipynb
+++ b/examples/example.ipynb
--- a/examples/texts.csv
+++ b/examples/texts.csv
--- a/examples/texty_example.ipynb
+++ b/examples/texty_example.ipynb
--- a/requirements.txt
+++ b/requirements.txt
+blis==0.7.7; python_version >= "3.6"
+catalogue==2.0.7; python_version >= "3.6"
+certifi==2021.10.8; python_full_version >= "3.6.0" and python_version >= "3.6"
+charset-normalizer==2.0.12; python_full_version >= "3.6.0" and python_version >= "3.6"
+click==8.0.4; python_version >= "3.7" and python_full_version >= "3.6.0"
+colorama==0.4.4; python_full_version >= "3.6.0" and platform_system == "Windows" and python_version >= "3.7" and (python_version >= "3.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.7")
+cycler==0.11.0; python_version >= "3.7"
+cymem==2.0.6; python_version >= "3.6"
+filelock==3.6.0; python_version >= "3.7" and python_full_version >= "3.6.0"
+fonttools==4.31.2; python_version >= "3.7"
+huggingface-hub==0.4.0; python_full_version >= "3.6.0"
+idna==3.3; python_full_version >= "3.6.0" and python_version >= "3.6"
+jinja2==3.1.1; python_version >= "3.7"
+joblib==1.1.0; python_version >= "3.7" and python_full_version >= "3.6.0"
+kiwisolver==1.4.1; python_version >= "3.7"
+langcodes==3.3.0; python_version >= "3.6"
+lexical-diversity==0.1.1
+markupsafe==2.1.1; python_version >= "3.7"
+matplotlib==3.5.1; python_version >= "3.7"
+murmurhash==1.0.6; python_version >= "3.6"
+nltk==3.7; python_version >= "3.7"
+numpy==1.22.3; python_version >= "3.8"
+packaging==21.3; python_version >= "3.7" and python_full_version >= "3.6.0"
+pandas==1.4.1; python_version >= "3.8"
+pathy==0.6.1; python_version >= "3.6"
+pillow==9.0.1; python_version >= "3.7"
+preshed==3.0.6; python_version >= "3.6"
+pydantic==1.8.2; python_full_version >= "3.6.1" and python_version >= "3.6"
+pyparsing==3.0.7; python_version >= "3.7" and python_full_version >= "3.6.0"
+python-dateutil==2.8.2; python_version >= "3.8" and python_full_version < "3.0.0" or python_full_version >= "3.3.0" and python_version >= "3.8"
+pytz==2022.1; python_version >= "3.8"
+pyyaml==6.0; python_version >= "3.6" and python_full_version >= "3.6.0"
+regex==2022.3.15; python_version >= "3.7" and python_full_version >= "3.6.0"
+requests==2.27.1; python_full_version >= "3.6.0" and python_version >= "3.6"
+sacremoses==0.0.49; python_full_version >= "3.6.0"
+scipy==1.6.1; python_version >= "3.7"
+seaborn==0.11.2; python_version >= "3.6"
+setuptools-scm==6.4.2; python_version >= "3.7"
+six==1.16.0; python_full_version >= "3.6.0" and python_version >= "3.8"
+smart-open==5.2.1; python_version >= "3.6" and python_version < "4.0"
+spacy-legacy==3.0.9; python_version >= "3.6"
+spacy-loggers==1.0.1; python_version >= "3.6"
+spacy==3.2.3; python_version >= "3.6"
+srsly==2.4.2; python_version >= "3.6"
+syllables==1.0.3; python_version >= "2.7"
+thinc==8.0.15; python_version >= "3.6"
+tokenizers==0.11.6; python_full_version >= "3.6.0"
+tomli==2.0.1; python_version >= "3.7"
+tqdm==4.63.1; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.4.0")
+transformers==4.17.0; python_full_version >= "3.6.0"
+typer==0.4.0; python_version >= "3.6"
+typing-extensions==4.1.1; python_full_version >= "3.6.1" and python_version >= "3.6"
+urllib3==1.26.9; python_full_version >= "3.6.0" and python_version < "4" and python_version >= "3.6"
+wasabi==0.9.0; python_version >= "3.6"
--- a/setup.cfg
+++ b/setup.cfg
-[metadata]
-name = texty
-version = 0.0.1
-author = Alba María Mármol
-author_email = ammarmol@ujaen.es
-description = Text analysis and processing package
-long_description = file: README.md
-long_description_content_type = text/markdown
-url = https://gitlab.ujaen.es/ammr0032/TextAnalysisSpacy
-project_urls =
-    Bug Tracker = https://gitlab.ujaen.es/ammr0032/TextAnalysisSpacy/issues
-classifiers =
-    Programming Language :: Python :: 3
-    License :: OSI Approved :: MIT License
-    Operating System :: OS Independent
-
-[options]
-package_dir =
-    = src
-packages = find:
-python_requires = >=3.6
-
-[options.entry_points]
-console_scripts =
-    analyze-complexity = texty.analyze_complexity:analyze_complexity
-
-[options.packages.find]
-where = src
\ No newline at end of file