Stack Exchange Network
Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It only takes a minute to sign up.
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Movie genre classification using machine learning
could you review my below code to train a machine learning model to classify movie genres? I've used the data from Kaggle
Nice variable names throughout, very standard.
That has been the default encoding for a very long time, so you might remove it. I can't imagine there's anything in your toolchain for which it makes a difference.
The if __name__ guard is nice. May as well push down all these statements so they're within def main(): , and then we're guarding just a single line which calls main(). Why? So local variables go out of scope upon exiting main(). And so it's easy to rename to something more meaningful as you write and write more code and then you step back to notice what it really does once you've finished it.
Now, let's work on that data directory.
You might also consider defining a models_dir parameter.
And let's modify the preprocess signature so we're passing in a CSV filespec.
But, wait! The last preprocessing step was to write that out. DRY . Rather than evaluating the preprocessor for side effects, returning None , it makes more sense for it to return that processed CSV pathname. Better still, since it already has a dataframe of joined genres, it should simply return df and then caller won't need to re-read a CSV.
Ok, back to the program in progress.
That's a pretty typical approach that naturally falls out of iterative hacking, nothing wrong with it. We read many columns and project down to just two of them. Consider telling pandas about that up front:
Again, there's nothing super wrong with that. But it would be better to get in the habit of phrasing it this way:
I have had too many colleagues run afoul of the dreaded SettingWithCopyWarning , and then it takes forever to explain why a script that ran fine for months is suddenly not behaving as expected after one tiny edit. Avoid inplace -- you don't need it.
Similarly when you later drop rows with no popular genre, and when you reset indexes.
Pep-8 encourages you to wrap the assigned expression within "extra" ( ) parentheses, no biggie.
The lambda is nice enough, but conventionally we name that expression itemgetter .
This is fine. It's worth noting that
would have given you a DataFrame rather than a Series. Possibly you would then have less "reset_index" trouble.
Don't be afraid to write a one-sentence """docstring""" for each function you define.
This code achieves its design objectives.
I would be willing to delegate or accept maintenance tasks for this codebase.
- \$\begingroup\$ Hi, thanks a lot for the answer. Sure, please feel free to checkout the github repo. github.com/t-thirupathi/movies-genre \$\endgroup\$ – Thirupathi Thangavel Jun 17 at 21:12
Sign up or log in, post as a guest.
Required, but never shown
Not the answer you're looking for? Browse other questions tagged python machine-learning or ask your own question .
- The Overflow Blog
- Like Python++ for AI developers
- Being creative with math: The immersive artist who traded a sketchpad for a...
- Featured on Meta
- Alpha test for short survey in banner ad slots starting on week of September...
- What should be next for community events?
Hot Network Questions
- Geometrical verifications for Algebraic formulae
- What is the meaning of "transformed away" in the following sentence?
- What are the differences between regular and limited version of Planescape - Adventures in the Multiverse set?
- Is non-consented video recording admissable evidence in a civil trial in Maryland?
- Is it possible to work on your personal idea as PhD thesis?
- Is the luggage trolley essential to get through Platform Nine and Three-Quarters?
- Add description next to matrix
- Abelianization of Non-Abelian Groups
- After I put my result on arXiv, I found out someone previously published it already. What next?
- Being asked to sign a release form after being terminated
- Can you "open" RAW camera files?
- Is it ok to use std::ignore in order to discard a return value of a function to avoid any related compiler warnings?
- What is the point of this double-ended spanner?
- Can the collapse of the wave function be modelled as a quantum system on its own?
- Creating a new language with Rust without Garbage Collection?
- Is 明朝 a typo for 早朝?
- Same flight taking one hour longer with same aircraft on different dates
- "Premove" in OTB game
- Does the increase in German exports to Russia's neighbors make up for losses in Russia proper?
- Extract data from ragged arrays
- What was the purpose of the breastplate rings on samurai armor?
- Why does ranges::for_each return the function?
- Why is each transaction broadcast twice in the Bitcoin network?
- Does flying slower actually save fuel?
Help | Advanced Search
Computer Science > Computer Vision and Pattern Recognition
Title: rethinking movie genre classification with fine-grained semantic clustering.
Abstract: Movie genre classification is an active research area in machine learning. However, due to the limited labels available, there can be large semantic variations between movies within a single genre definition. We expand these 'coarse' genre labels by identifying 'fine-grained' semantic information within the multi-modal content of movies. By leveraging pre-trained 'expert' networks, we learn the influence of different combinations of modes for multi-label genre classification. Using a contrastive loss, we continue to fine-tune this 'coarse' genre classification network to identify high-level intertextual similarities between the movies across all genre labels. This leads to a more 'fine-grained' and detailed clustering, based on semantic similarities while still retaining some genre information. Our approach is demonstrated on a newly introduced multi-modal 37,866,450 frame, 8,800 movie trailer dataset, MMX-Trailer-20, which includes pre-computed audio, location, motion, and image embeddings.
- Download PDF
- Other Formats
References & Citations
- Google Scholar
- Semantic Scholar
DBLP - CS Bibliography
Bibtex formatted citation.
Bibliographic and Citation Tools
Code, data and media associated with this article, recommenders and search tools.
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .
Poster-Based Multiple Movie Genre Classification Using Inter-Channel Features
- Change Username/Password
- Update Address
- Payment Options
- Order History
- View Purchased Documents
- Communications Preferences
- Profession and Education
- Technical Interests
- US & Canada: +1 800 678 4333
- Worldwide: +1 732 981 0060
- Contact & Support
- About IEEE Xplore
- Nondiscrimination Policy
- Privacy & Opting Out of Cookies
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2023 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.
Search code, repositories, users, issues, pull requests...
We read every piece of feedback, and take your input very seriously.
Use saved searches to filter your results more quickly.
To see all available qualifiers, see our documentation .
In this repo i have created a Movies Genre Classification project in machine learning using NLP.
Name already in use.
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more about the CLI .
- Open with GitHub Desktop
- Download ZIP
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Movies genre classification 📓.
In this repo i have created a Movies Genre Classification project in machine learning using NLP, and i am using nltk Library for NLP.
Technology used in Project ♨️
Bug / Feature Request 👨💻
If you find a bug (the website couldn't handle the query and / or gave undesired results), kindly open an issue here by including your search query and the expected result.
If you'd like to request a new function, feel free to do so by opening an issue here . Please include sample queries and their corresponding results.
Connect with me! 🌐
Known on internet as Yogesh Nile
- Jupyter Notebook 99.5%
- Python 0.5%