Tokenize In this assignment, you are going to write a python program to read and tokenize the data. The following is the training data format where the fir

Tokenize In this assignment, you are going to write a python program to read and tokenize the data. The following is the training data format where the first column is the reviewer id, the second column indicates whether this review is fake or true, the third column represents whether the review is positive or negative, and the rest is the review. Your task is to learn whether the review is fake or true and positive or negative based on the review. Assignment 1A (25 points)

Due Date: 9/5 (11:59pm)

Description: In this assignment, you are going to write a python program to read and tokenize
the data. The following is the training data format where the first column is the reviewer id, the
second column indicates whether this review is fake or true, the third column represents
whether the review is positive or negative, and the rest is the review. Your task is to learn
whether the review is fake or true and positive or negative based on the review.

Input Data

064BmtQ Fake Neg I was very disappointed with this hotel. I have stayed …
0Dh2p5S True Pos We stayed at the Palmer House Hilton …

Your first task is read the data into your python objects.

• Extract the labels
[‘Fake’, ‘Neg’]

• Extract each review
I was very disappointed with … the chain’s reputation.

• Tokenize the sentences
[‘disappointed’, ‘hotel’, ‘stayed’, ‘swissotels’, ‘enjoyed’, ‘service’, ‘described’, ‘aloof’, ‘warmth’,
‘prolonged’, ‘checkin’, ‘procedure’, ‘woman’, ‘repeatedly’, ‘asked’, ‘provide’, ‘information’, ‘given’,
‘minutes’, ‘ago’, ‘precise’, ‘room’, ‘took’, ‘forever’, ‘pick’, ‘good’, ‘sign’, ‘way’, ‘busy’, ‘food’, ‘arrived’,
‘late’, ‘cold’, ‘man’, ‘tried’, ‘replace’, ‘hour’, ‘price’, ‘reduction’, ‘free’, ‘dessert’, ‘apologize’, ‘cleanliness’,
‘godly’, ‘knocked’, ‘door’, ‘0800’, ‘despite’, ‘fact’, ‘doorknocker’, ‘requesting’, ‘sleeper’, ‘stay’, ‘clearly’,
‘did’, ‘help’, ‘build’, ‘chain’, ‘reputation’]

• Store the extracted data to lists

• Repeat it for all the data

• Print out the first and the last labels from your stored list

• Print out the first and the last tokens (reviews) from your stored list

To Run:
>> python learn.py training-text.txt

Submission:
Submit the python program to blackboard

Submit a Comment

Open chat