Metadata-Version: 2.1
Name: text-hammer
Version: 0.1.4
Summary: This is text preprocessing package
Home-page: UNKNOWN
Author: Abhishek Jaiswal
Author-email: abhishek.jaiswal26102001@gmail.com
License: Apache License 2.0
Description: Dependencies
        ```
        pip install spacy==2.2.3
        python -m spacy download en_core_web_sm
        pip install beautifulsoup4==4.9.1
        pip install textblob==0.15.3
        ```
        
        
        INSTALLATION 
        '''
        pip install text_hammer
        
        '''
        
        
        #### How to use it for preprocessing
        
        You have to have installed spacy and python3 to make it work.
        import text_hammer as th
        ```
        def get_clean(x):
            x = str(x).lower().replace('\\', '').replace('_', ' ')
            x = th.cont_exp(x)
            x = th.remove_emails(x)
            x = th.remove_urls(x)
            x = th.remove_html_tags(x)
            x = th.remove_rt(x)
            x = th.remove_accented_chars(x)
            x = th.remove_special_chars(x)
            x = re.sub("(.)\\1{2,}", "\\1", x)
            return x
        ```
        
        Use this if you want to use one by one
        ```
        import pandas as pd
        import numpy as np
        import text_hammer as th
        
        df = pd.read_csv('imdb_reviews.txt', sep = '\t', header = None)
        df.columns = ['reviews', 'sentiment']
        
        # These are series of preprocessing
        df['reviews'] = df['reviews'].apply(lambda x: th.cont_exp(x)) #you're -> you are; i'm -> i am
        df['reviews'] = df['reviews'].apply(lambda x: th.remove_emails(x))
        df['reviews'] = df['reviews'].apply(lambda x: th.remove_html_tags(x))
        df['reviews'] = df['reviews'].apply(lambda x: th.remove_urls(x))
        
        df['reviews'] = df['reviews'].apply(lambda x: th.remove_special_chars(x))
        df['reviews'] = df['reviews'].apply(lambda x: th.remove_accented_chars(x))
        df['reviews'] = df['reviews'].apply(lambda x: th.make_base(x)) #ran -> run, 
        df['reviews'] = df['reviews'].apply(lambda x: th.spelling_correction(x).raw_sentences[0]) #seplling -> spelling
        ```
        
        Note: Avoid to use `make_base` and `spelling_correction` for very large dataset otherwise it might take hours to process.
        
        
        #### Extra
        
        ```
        x = 'lllooooovvveeee youuuu'
        x = re.sub("(.)\\1{2,}", "\\1", x)
        print(x)
        ---
        love you
        ```
Platform: UNKNOWN
Description-Content-Type: text/markdown
