May 2025

Another Review Classification System, an AI review analyzer to determine if reviews are positive or negative.
ARCS (Another Review Classification System) is a lightweight and modular Python command-line tool for analyzing customer reviews stored in CSV files. Its core workflow is designed to take a column of free-text reviews, (optionally) filter those reviews by semantic relevance to a topic, classify the remaining reviews by sentiment, and then export the results into clean, analysis-ready CSV files.
At a high level, the project essentially combines two useful NLP tasks into a single practical pipeline:
The result of these combined NLPs is a tool that is able to turn raw review text into structured outputs that are easy to inspect, summarize, and use in downstream analysis.
I built ARCS as a practical way to test a workflow for analyzing large amounts of user feedback without having to manually read every review. The original concept was simple: first narrow the dataset to reviews that are actually about the topic of interest, then separate those reviews into what people liked and what they did not.
This project was especially useful as a way to validate an approach that could scale to hundreds or thousands of reviews. Rather than treating sentiment analysis as a standalone classification problem, ARCS treats it as part of a larger review-analysis pipeline: relevance first, sentiment second, structured output at the end. This wasn't meant to be a full end-to-end solution, but it more as a proof of concept for a workflow that could be extended with more features (aspect-based sentiment, better neutral handling, reporting, etc.) in the future.
ARCS is built around four main capabilities:
The tool is designed to work with robust datasets. It reads review text from a CSV file, lets the user choose which column contains the reviews, and writes multiple output CSV files that can be inspected directly in Excel, pandas, or other analysis tools. This ensures maximum flexibility and compatibility with existing datasets, and makes it easy to integrate with existing workflows.
Instead of forcing every review into a hard positive or negative label, ARCS uses a confidence threshold. If the model’s confidence is high enough, the predicted label is accepted. If it falls below the threshold, the review can be treated as uncertain and optionally exported separately.
This is an important design choice because review text is often messy, mixed, or ambiguous A threshold-based system is more honest and more useful than pretending every prediction is equally reliable.
ARCS supports more than one sentiment model behind a shared interface:
distilbert-base-uncased-finetuned-sst-2-englishThis makes the project modular rather than tightly coupled to one implementation. More models can be added later without breaking the CLI, so if this was to be expanded to a production environment, it would be easy to swap in a more powerful or domain-specific model.
If a topic is provided, ARCS computes semantic similarity between each review and the topic string using sentence embeddings. Reviews below a configurable relevance threshold are filtered out before sentiment classification.
This enables targeted analysis such as:
ARCS is structured around a clear separation of concerns, which makes the codebase easy to extend and maintain.
| Component | Description |
|---|---|
main.py | CLI entry point |
processing/classifier_engine.py | Orchestrates the review analysis workflow |
relevance/relevance_scorer.py | Handles topic relevance scoring |
models/ | Contains sentiment model implementations |
processing/io_handler.py | Handles CSV reading and writing |
utils/ | Contains helper logic for text cleaning and CLI formatting |
config.py | Stores shared configuration values |
That flow makes the project feel like a real tool rather than a one-off script.
ARCS/
├── main.py
├── config.py
├── requirements.txt
├── example_reviews.csv
├── models/
│ ├── sentiment_base.py
│ ├── bert_sentiment.py
│ └── dummy_sentiment.py
├── processing/
│ ├── classifier_engine.py
│ └── io_handler.py
├── relevance/
│ └── relevance_scorer.py
├── utils/
│ └── text_utils.py
├── tests/
│ ├── test_classifier.py
│ ├── test_relevance.py
│ ├── test_combined.py
│ ├── example_diverse_reviews.csv
│ ├── run_test.sh
│ └── run_test.bat
├── docs/
└── outputs/
This structure is one of the project’s strengths. It keeps modeling logic, orchestration, input/output, and utility code separate, which makes it easier to replace components or expand functionality later. Probably the first time I actually approached structure like this, and it was a big win for me.
ARCS expects a CSV file containing review text.
By default, it looks for a column named review_text, though this can be changed with --review-column.
The tool generates multiple structured CSV files depending on the run configuration:
all_reviews_with_confidence.csv
Master file containing sentiment labels, confidence scores, and relevance metadata.
positive_reviews.csv
Reviews classified as positive with confidence at or above the threshold.
negative_reviews.csv
Reviews classified as negative with confidence at or above the threshold.
neutral_or_uncertain_reviews.csv
Optional file for reviews below the confidence threshold.
irrelevant_reviews.csv
Optional file for reviews filtered out by topic relevance.
This output design is intentionally analysis-friendly. Instead of giving one opaque prediction file, ARCS creates clean subsets that can immediately be counted, sorted, summarized, or visualized.
python main.py --input reviews.csv
python main.py --input reviews.csv --threshold 0.85
python main.py --input reviews.csv --include-neutral
python main.py --input reviews.csv --topic "museum exhibits"
python main.py --input reviews.csv --topic "customer service" --relevance-threshold 0.6
python main.py --input reviews.csv --model dummy
The text preprocessing is lightweight and deterministic. Reviews are cleaned by normalizing casing, removing HTML and URLs, stripping punctuation, and normalizing whitespace.
This is a good design choice for a CLI tool like ARCS. The preprocessing reduces noise without making the pipeline hard to reason about.
The sentiment model is selected by name and loaded dynamically. This means the engine does not need to be rewritten every time a new classifier is added.
That makes ARCS easy to extend. As long as a new model follows the shared interface, it can plug into the existing pipeline.
Both relevance scoring and model inference are designed to run in batches, which matters when the input dataset grows beyond small demos. This keeps the project practical for larger review datasets and makes the CLI feel responsive rather than fragile.
One subtle strength of the project is that it pays attention to the command-line user experience. ARCS prints structured sections, statistics, and progress indicators so the run feels understandable rather than like a black box.
ARCS currently supports two sentiment backends:
The main model uses a pretrained DistilBERT sentiment classifier from Hugging Face. This gives the project a modern transformer-based backend with good performance on general sentiment tasks.
Best for: higher-quality predictions, realistic sentiment analysis workflows
Tradeoff: slower runtime and heavier dependencies
The dummy model is a lightweight keyword-based implementation intended for testing and fast iteration.
Best for: quick validation, environments without large downloads, simple demos
Tradeoff: much lower accuracy and limited nuance
Including both models is a strong engineering decision because it supports both development speed and production-style experimentation.
It is worth it to note that while the BERT model is the primary backend, the Dummy baseline was added to demonstrate the flexibility of adding future models. This means that in the future, it would be straightforward to add more powerful or domain-specific sentiment classifiers without changing the overall architecture of the tool.
One of the most interesting parts of ARCS is the optional relevance layer. If a topic is provided, the tool computes sentence embeddings for the reviews and the topic string, then compares them using cosine similarity.
This matters because sentiment alone is often too broad. In a real review dataset, only some reviews may be about the aspect the analyst actually cares about. Relevance filtering solves that by narrowing the dataset before sentiment classification happens.
Conceptually, the workflow becomes:
That makes the final output much more actionable.
The repository includes test and demo scripts under tests/ for three main scenarios:
This is a useful part of the project because it supports quick iteration on thresholds, model selection, and sample datasets without requiring the full CLI setup every time. It also makes the repo easier for other people to understand and try out.
Several design decisions make ARCS stronger out of the box than a typical quick-and-dirty NLP script:
Together, these choices make ARCS feel like a practical NLP utility rather than just a classroom demo.
If I were extending ARCS further, these would be the highest-priority improvements:
Instead of only classifying overall sentiment, identify which aspects are being discussed and attach sentiment to each one. That would make outputs much more useful for product or service analysis.
Right now, “neutral” is effectively represented as low confidence. That is pragmatic, but not ideal. A true three-class or mixed-sentiment setup would better distinguish:
There is clear room to optimize repeated runs:
A reporting layer would make the tool more complete. Useful additions would include:
Optional quality-of-life support also includes terminal formatting enhancements and documentation under docs/.
pip install -r requirements.txt
python main.py --input example_reviews.csv --topic "museum exhibits" --include-neutral
ARCS is a modular, extensible review-analysis CLI that combines topic relevance filtering with sentiment classification to turn raw customer-review text into structured, analysis-ready outputs. Its strongest qualities are its clean architecture, confidence-aware predictions, practical CSV workflow, and clear separation between relevance scoring, model inference, and output generation.
ARCS was a strong exercise in building a practical NLP tool around real-world constraints: noisy text, high review volume, incomplete certainty, and the need to isolate specific themes before analyzing sentiment.
This project was probably one of the first times I've finished something and actually felt good enough about it to release it. It was a great learning experience, and I think it shows in the design and implementation choices, and I'm really happy with how it turned out considering the minimal time I actually spent working on it. Probably my biggest takeaways was that I learned a solid amount about NLP and machine learning, but I also reinforced a strong understanding of project structure and robust design. Looking back, I originally was going to approach this as a pretty simple test project just to see if my NLP processing approach for something else I was doing would work, but the decisions I made really allowed me to grow the project beyond that. I'm glad I took the time to learn more about the subject and to see how it could be applied in a real-world setting.
Specifically though, I would say that he biggest technical win in the project was combining semantic relevance filtering with confidence-aware sentiment classification in a single modular CLI workflow. That combination makes the outputs more targeted, more trustworthy, and more useful than a simple “run sentiment on everything” pipeline.
I am open to ANY questions about this project! Shoot me an email or a message on my LinkedIn, and I will be happy to chat about the data, the code, the design decisions, or anything else related to this project.
Parts of this project were developed in collaboration with generative AI.