###
API
###


Normalization
=============

__class__ NormalizeText

* Attributes

	* stopwords - a list of english stopwords
	* spell_checker - a spell checker of type spellchecker_ml (see https://github.com:EricSchles/spellchecker_ml for more information)

__method__ add_to_stopwords(words: list)

Add stop words to the stopwords attribute

usage::

	```
	from text_processing_ml.normalization import NormalizeText

	normalizer = NormalizeText()
	normalizer.add_to_stopwords(["Hello", "there"])
	normalizer.remove_stopwords("Hello there") == ''
	```

__method__ remove_stopwords(text: str) -> str

Remove stop words, those found in stopwords attribute.

usage::

	```
	from text_processing_ml.normalization import NormalizeText

	normalizer = NormalizeText()
	normalizer.add_to_stopwords(["Hello", "there"])
	normalizer.remove_stopwords("Hello there") == ''
	```

__method__ normalize_case(text: str) -> str

convert text to lower case

usage::

	```

	from text_processing_ml.normalization import NormalizeText

	normalizer = NormalizeText()
	normalizer.normalize_case("Hello") # returns "hello"
	```

__method__ initialize_spellchecker(corpus: str, corpus_name: str, words: list, ner_text: str)

Initialize the spellchecker with text to ignore and a corpus to train on, for more directed text correction

usage::

	```

	from text_processing_ml.normalization import NormalizeText

	with open("corpus.txt", "r") as f:
		corpus = f.read()

	normalizer = NormalizeText()
	normalizer.initialize_spellchecker(corpus=text, corpus_name="corpus", words=["Hello", "there"], ner_text=text)
	normalizer.make_spelling_correction("Hello there friendz") # returns "Hello there friends"
	```

__method__ strip_punctuation(word: str, return_punctuation: bool) -> str

strips out the punctuation from a word. If return_punctuation is True then the punctuation is returned as well, separately.

usage::

	```

	from text_processing_ml.normalization import NormalizeText

	normalizer = NormalizeText()
	normalizer.strip_punctuation("Hello,") # returns Hello
	normalizer.strip_punctuation("Hello,", return_punctuation) # returns (Hello, ",")
	```

__method__ correctly_spelled(word: str) -> bool

Checks if the word is correctly spelled, provided the word appears in spellchecker_ml's dictionary of words.  Returns False otherwise

usage::

	```

	from text_processing_ml.normalization import NormalizeText

	normalizer = NormalizeText()
	normalizer.correctly_spelled("Helo") #returns False
	```

__method__ make_spelling_correction(text: str) -> str

Correct the spelling of a piece of text using a hidden markov model (for now).

usage::

	```

	from text_processing_ml.normalization import NormalizeText

	normalizer = NormalizeText()
	normalizer.initialize_spellchecker(corpus=text, corpus_name="corpus", words=["Hello", "there"], ner_text=text)
	normalizer.make_spelling_correction("Hello there friendz") # returns "Hello there friends"
	```

__method__ correct_whitespace(text: str) -> str

Normalizes the white space to one space per token.

usage::

	```

	from text_processing_ml.normalization import NormalizeText

	normalizer = NormalizeText()
	normalizer.correct_whitespace(" Hello  there friends  \t whatever") 
	# returns Hello there friends whatever
	```

Matching
========


Parsing
=======

__class__ ParseText

* Attributes
	* stemmer - a Porter Stemmer from nltk
	* normalizer - the normalizer found elsewhere in the project

__method__  stem_tokens(tokens: list) -> list

Returns a list of stemmed tokens.  Stemming is the process of getting the root word of a word.

Example:

runs -> run
jumping -> jump
flying -> fly

usage::

	```
	from text_processing_ml.parsing import ParseText

	parser = ParseText()
	parser.stem_tokens("Hello there friends".split()) # returns Hello there friend
	```

__method__ tokenize(text: str) -> list

Tokenize and stem a string of words into stemmed tokens.

usage::

	```
	from text_processing_ml.parsing import ParseText

	parser = ParseText()
	parser.tokenize("Hello there friends") # ["Hello", "there", "friend"]
	```

__method__ normalize_text(text: str) -> str

Normalize the a piece of text by lower casing it and removing punctuation

usage::

	```
	from text_processing_ml.parsing import ParseText

	parser = ParseText()
	parser.normalize_text("Hello there, friend") # returns hello there friend
	```

__method__ tfidf(documents: list) 

Returns the Term frequency given the inverse document frequency

usage::
	
	```
	from text_processing_ml.parsing import ParseText

	parser = ParseText()
	parser.tfidf(["Hello there friends", "How are you doing?", "what's up"])
	# returns the term frequency matrix
	```