Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It only takes a minute to sign up.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Movie genre classification using machine learning

could you review my below code to train a machine learning model to classify movie genres? I've used the data from Kaggle

  • machine-learning

J_H's user avatar

Nice variable names throughout, very standard.

coding: utf-8

That has been the default encoding for a very long time, so you might remove it. I can't imagine there's anything in your toolchain for which it makes a difference.

The if __name__ guard is nice. May as well push down all these statements so they're within def main(): , and then we're guarding just a single line which calls main(). Why? So local variables go out of scope upon exiting main(). And so it's easy to rename to something more meaningful as you write and write more code and then you step back to notice what it really does once you've finished it.

Now, let's work on that data directory.

You might also consider defining a models_dir parameter.

And let's modify the preprocess signature so we're passing in a CSV filespec.

But, wait! The last preprocessing step was to write that out. DRY . Rather than evaluating the preprocessor for side effects, returning None , it makes more sense for it to return that processed CSV pathname. Better still, since it already has a dataframe of joined genres, it should simply return df and then caller won't need to re-read a CSV.

Ok, back to the program in progress.

That's a pretty typical approach that naturally falls out of iterative hacking, nothing wrong with it. We read many columns and project down to just two of them. Consider telling pandas about that up front:

Again, there's nothing super wrong with that. But it would be better to get in the habit of phrasing it this way:

I have had too many colleagues run afoul of the dreaded SettingWithCopyWarning , and then it takes forever to explain why a script that ran fine for months is suddenly not behaving as expected after one tiny edit. Avoid inplace -- you don't need it.

Similarly when you later drop rows with no popular genre, and when you reset indexes.

Pep-8 encourages you to wrap the assigned expression within "extra" ( ) parentheses, no biggie.

The lambda is nice enough, but conventionally we name that expression itemgetter .

This is fine. It's worth noting that

would have given you a DataFrame rather than a Series. Possibly you would then have less "reset_index" trouble.

Don't be afraid to write a one-sentence """docstring""" for each function you define.

This code achieves its design objectives.

I would be willing to delegate or accept maintenance tasks for this codebase.

  • \$\begingroup\$ Hi, thanks a lot for the answer. Sure, please feel free to checkout the github repo. \$\endgroup\$ –  Thirupathi Thangavel Jun 17 at 21:12

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct .

Not the answer you're looking for? Browse other questions tagged python machine-learning or ask your own question .

  • The Overflow Blog
  • Like Python++ for AI developers
  • Being creative with math: The immersive artist who traded a sketchpad for a...
  • Featured on Meta
  • Alpha test for short survey in banner ad slots starting on week of September...
  • What should be next for community events?

Hot Network Questions

  • Geometrical verifications for Algebraic formulae
  • What is the meaning of "transformed away" in the following sentence?
  • What are the differences between regular and limited version of Planescape - Adventures in the Multiverse set?
  • Is non-consented video recording admissable evidence in a civil trial in Maryland?
  • Is it possible to work on your personal idea as PhD thesis?
  • Is the luggage trolley essential to get through Platform Nine and Three-Quarters?
  • Add description next to matrix
  • Abelianization of Non-Abelian Groups
  • After I put my result on arXiv, I found out someone previously published it already. What next?
  • Being asked to sign a release form after being terminated
  • Can you "open" RAW camera files?
  • Is it ok to use std::ignore in order to discard a return value of a function to avoid any related compiler warnings?
  • What is the point of this double-ended spanner?
  • Can the collapse of the wave function be modelled as a quantum system on its own?
  • Creating a new language with Rust without Garbage Collection?
  • Is 明朝 a typo for 早朝?
  • Same flight taking one hour longer with same aircraft on different dates
  • "Premove" in OTB game
  • Does the increase in German exports to Russia's neighbors make up for losses in Russia proper?
  • Extract data from ragged arrays
  • What was the purpose of the breastplate rings on samurai armor?
  • Why does ranges::for_each return the function?
  • Why is each transaction broadcast twice in the Bitcoin network?
  • Does flying slower actually save fuel?

movie genre classification kaggle

Your privacy

By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy .

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: rethinking movie genre classification with fine-grained semantic clustering.

Abstract: Movie genre classification is an active research area in machine learning. However, due to the limited labels available, there can be large semantic variations between movies within a single genre definition. We expand these 'coarse' genre labels by identifying 'fine-grained' semantic information within the multi-modal content of movies. By leveraging pre-trained 'expert' networks, we learn the influence of different combinations of modes for multi-label genre classification. Using a contrastive loss, we continue to fine-tune this 'coarse' genre classification network to identify high-level intertextual similarities between the movies across all genre labels. This leads to a more 'fine-grained' and detailed clustering, based on semantic similarities while still retaining some genre information. Our approach is demonstrated on a newly introduced multi-modal 37,866,450 frame, 8,800 movie trailer dataset, MMX-Trailer-20, which includes pre-computed audio, location, motion, and image embeddings.

Submission history

Access paper:.

  • Download PDF
  • Other Formats

movie genre classification kaggle

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Poster-Based Multiple Movie Genre Classification Using Inter-Channel Features

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2023 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

In this repo i have created a Movies Genre Classification project in machine learning using NLP.


Name already in use.

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more about the CLI .

  • Open with GitHub Desktop
  • Download ZIP

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Movies genre classification 📓.

In this repo i have created a Movies Genre Classification project in machine learning using NLP, and i am using nltk Library for NLP.

movie genre classification kaggle

Technology used in Project ♨️

movie genre classification kaggle

ScreenShot 📸

movie genre classification kaggle

Bug / Feature Request 👨‍💻

If you find a bug (the website couldn't handle the query and / or gave undesired results), kindly open an issue here by including your search query and the expected result.

If you'd like to request a new function, feel free to do so by opening an issue here . Please include sample queries and their corresponding results.

Connect with me! 🌐

Known on internet as Yogesh Nile

movie genre classification kaggle

  • Jupyter Notebook 99.5%
  • Python 0.5%


  1. Movie Genres Classification Web App

    movie genre classification kaggle

  2. Genre Classification Dataset IMDb

    movie genre classification kaggle

  3. Movie Genre+Plot+Poster

    movie genre classification kaggle

  4. GitHub

    movie genre classification kaggle


    movie genre classification kaggle

  6. Multi-label classification problem script testing dataset

    movie genre classification kaggle


  1. Codereview: Movie genre classification using machine learning

  2. Winning Data Science Competitions: Jeong-Yoon Lee

  3. Kaggle Discussion and classifying disaster Tweets on Kaggle with Shyambhu

  4. Mewga

  5. Mewga

  6. Satellite image classification


  1. Movie Genre Classification

    Predict movie genre by a plot summary

  2. Movie Genre Classification

    Predict movie genre by a plot summary. code. New Notebook. table_chart. New Dataset. emoji_events. New Competition. ... We use cookies on Kaggle to deliver our services, analyze web traffic, and improve your experience on the site. By using Kaggle, you agree to our use of cookies. Got it.

  3. Genre Classification Dataset IMDb

    IMDb (an acronym for Internet Movie Database) is an online database of information related to films, television programs, home videos, video games, and streaming content online - including cast, production crew and personal biographies, plot summaries, trivia, ratings, and fan and critical reviews.

  4. NLP(movie genre classification)

    894.2 s history Version 1 of 1 License This Notebook has been released under the Apache 2.0 open source license. Continue exploring Explore and run machine learning code with Kaggle Notebooks | Using data from movie genre data

  5. Movie Genre Prediction Using Multi Label Classification

    Predicting Movie Genres using NLP - An Awesome Introduction to Multi-Label Classification Prateek Joshi — Published On April 22, 2019 and Last Modified On July 19th, 2022 Advanced Classification Machine Learning NLP Project Python Supervised Technique Telecom Text Unstructured Data Introduction

  6. movie-genre-classification · GitHub Topics · GitHub

    Keras implementation of multi-label classification of movie genres from IMDB posters keras image-classification cnn-keras fine-tuning-cnns imagedatagenerator keras-tutorial movie-genre-classification multi-label-image-classification one-hot-encoding Updated on May 14, 2020 Jupyter Notebook Wonuabimbola / movie-genre-prediction Star 6 Code Issues

  7. python

    Movie genre classification using machine learning. could you review my below code to train a machine learning model to classify movie genres? I've used the data from Kaggle. #!/usr/bin/env python # coding: utf-8 import pandas as pd import numpy as np import ast import joblib from collections import Counter from sklearn.model_selection import ...

  8. PDF Multi-label Movie Genre Classification Using Multiple Modalities

    The task of genre classification is an important problem in machine learning. Accurate classification can be applied to both cataloging and recommending content to users. In the case of movie genre classification, usually a single label is not sufficient to properly characterize each example, e.g. a film

  9. PDF Classification of Movie Posters to Movie Genres

    able to convey the genre of the movie to a human observer, with no prior knowledge of the movie, at a glance. Dataset: Kaggle-Movie Genre from its Poster [1]. 36898 256X256 resolution RGB posters. Each poster is labeled with types of genres. 29887 train set, 3321 val set, 3690 test set. ResNet-50 VGG16 DenseNet-169 Drama Comedy Romance Action Crime

  10. [2012.02639] Rethinking movie genre classification with fine-grained

    Movie genre classification is an active research area in machine learning. However, due to the limited labels available, there can be large semantic variations between movies within a single genre definition. We expand these 'coarse' genre labels by identifying 'fine-grained' semantic information within the multi-modal content of movies. By leveraging pre-trained 'expert' networks, we learn ...

  11. Multimodal deep learning to predict movie genres

    Published in Towards Data Science · 8 min read · May 21, 2020 -- 1 The Dark Knight. Image from TMDB. "Batman raises the stakes in his war on crime. With the help of Lt. Jim Gordon and District Attorney Harvey Dent, Batman sets out to dismantle the remaining criminal organizations that plague the streets.

  12. Movie genre classifier using a dataset created using Google Images

    In most cases, we can see clearly why the classifier got confused. For example, the first movie poster (Guardians of the galaxy) is mislabeled as 'romance' in our dataset, the second image has nothing scary in it for it to be labeled as 'horror', the 5th poster (batman) looks a lot like a 'horror' movie poster, the 6th poster seems a bit irrelevant and the 8th poster (Jack Reacher ...

  13. Multi-Label Genre Classification of Movies From Their Posters

    Figure 1 Approach Overview Our goal in this project was to create a supervised learning model that would accurately classify a movie's genre based on its poster. In order to do this, we needed to obtain genre classification and poster images for a significant amount of movies.

  14. Predicting Movie Genres Based on Plot Summaries

    Predicting Movie Genres Based on Plot Summaries Kunal Gupta · Follow 9 min read · Jun 11, 2019 -- 2 In this article, we will take a very hands-on approach to understand the multi-label...

  15. GitHub

    Genre-Classification-IMDbDataset This report is based on the dataset from a competition on the Kaggle website, which challenges people to analyze and build a classification model that classifies the genre of a movie by its description.

  16. GitHub

    Movie Genre Classification This project's goal is to build and compare results of multiple machine learning models to classify movie's genres based on their plot overviews. The dataset can be found in the following link. ( ).

  17. Poster-Based Multiple Movie Genre Classification Using Inter-Channel

    To classify the movie genres, we reconstructed the poster dataset into 12 multi-genres that emphasized the characteristics of each poster. Published in: IEEE Access ( Volume: 8 ) Article #: Page (s): 66615 - 66624 Date of Publication: 06 April 2020 ISSN Information: Electronic ISSN: 2169-3536 INSPEC Accession Number: 19521509

  18. CNN Approach for Predicting Movie Genre from Posters!

    Here the model only predicts for 3 types of genres but in the future, a more complex model using ResNet can be built that predicts for more than 10 or 20 types of genres. Machine learning algorithm K-Nearest Neighbours can also be used for this purpose. Conclusion. Above we saw how we can build a model that can predict movie genre from its poster.

  19. How to Identify Movie Genres: Beginner's Guide to 13 Film Genres

    Horror films often feature jump scares and have more action than dialogue. Primary film genres include action, adventure, comedy, drama, fantasy, horror, musicals, mystery, romance, science fiction, sports, thriller, and Western. War films and zombie films are examples of themes that can span various genres, like action, drama, or thriller.

  20. bococharbel/kaggle_movie_genre_classification

    kaggle_movie_genre_classification *Authors: Charbel Boco Using Inception V3 for the Kaggle Movie Genre from Its Poster dataset classification Dataset The dataset is available on Kaggle website . You need to login to Kaggle to download the dataset. you will have to put the csv files (train.csv and test.csv) at the root directory of you project.

  21. Movies Genre Classification

    kaggle_movie_train.csv Movies Genre Classification 📓 In this repo i have created a Movies Genre Classification project in machine learning using NLP, and i am using nltk Library for NLP. Technology used in Project ♨️ ScreenShot 📸 WordCloud Bug / Feature Request 👨‍💻

  22. Movie reviews classification

    Movie reviews (and more) data for sentiment classification