Ever since the first motion picture camera was invented 120 years ago, film, one of the richest form of human expression and a now gigantic industry that generates tens of billions of dollars every year, has witnessed numerous technological and artistic developments and engaged almost everybody with its entertainment. Nowadays, while many people are working towards producing movies of the highest quality, the stakeholders of this industry are increasingly focused on commercial success. As a result, very few studies have been done on how to make high quality films as opposed to how to achieve box office success. Aspired to discover the secret formula of producing the high quality movies, we as a team are going to explore multiple features of film production and advance people’s understanding of this topic with machine learning models. We believe our endeavors will benefit many people especially film producers, directors, companies or students who are seeking the magic formula to craft high quality films.
Quaternary Quality Class Labels:
A: rating >= 7.1, first quartileBinary Quality Class Labels:
A: rating >= 6.7, top 40%, high quality
We conducted experiments on both the Quaternary Class Label and Binary Class Label dataset using a number of different models. The Quaternary Class Label, which divides movies into more usable quality classes, has the tradeoff of low prediction accuracies. However, it doesn't mean the model is necessarily less usable as the classification contains more information. In either case, the models show that the features such as award indices of director, actors and actresses as well as budget shows significant predictive power of movie quality, captured in ratings or classes in our measure.
The models that yielded considerable accuracy improvements from the base accuracy (25% and 60% as we partitioned the quality classes) for each case are included. Various parameters of the models are tuned to produce their performance. The multilayer perceptron neural network model results in the best accuracy with a slight advantage over decision tree algorithms, especially for the Four Class Label Model. We attribute this difference to the fact that the size of our data is too small for the tree models to achieve good results for the relatively big output space (quaternary vs binary). Please see a more detailed discussion about different models in the report.
This project is created for EECS 349 Machine Learning at Northwestern University.
Guixing Lin
guixinglin2018@u.northwestern.edu
Junhan Liu
junhanliu2015@u.northwestern.edu
Ruohong Zhang
ruohongzhang2017@u.northwestern.edu
Yi Zhang
yizhang2017@u.northwestern.edu