Research and prediction of the startups’ success on kickstarter platform
Keywords:Forecasting, Extreme Gradient Boosting Method, K-nearest Neighbor Method, Survival Models, Startups, Project Success, Kickstarter Platform
AbstractThe main purpose of the study, carried out in the work, was to identify and predict the success of new start-up projects. The task of predicting the success of one or another startup was solved, various methods of data analysis, such as methods of extreme gradient boosting and k-nearest neighbors, were used. They allowed to predict with high precision the success of the project, and the method of extreme gradient boosting was the most effective. The use of survival models allowed us to estimate the average time spent working on a successful startup, as well as identify those key industries for which startups become effective, predicting for each of them the required time to turn a progressive idea into a successful business. The most successful categories of start-up projects were also identified, and the time required to achieve the success (survival) of projects as a whole and for specific project categories was predicted. For this purpose, survival models were constructed on the basis of Cox proportional risks and Kaplan-Meyer models.
Conventional Wisdom Says 90% of Startups Fail. Data Says Otherwise // Fortune. — Updated June 2017. — Available at: http://fortune.com/2017/06/27/startup-advice-data-failure/
Why startups fail, according to their founders // Fortune. — Updated September 2014. — Available at: http://fortune.com/2014/09/25/why-startups-fail-according-to-their-founders/
Altman N.S. An introduction to kernel and nearest-neighbor nonparametric regression / N.S. Altman // The American Statistician. — 1992. — P. 175–185.
Classifier comparison // Scikit-learn. — Updated 2018. — Available at: https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
XGBoost (eXtreme Gradient Boosting) // Distributed (Deep) Machine Learning Community. — Updated 2016. — Available at: https://github.com/dmlc/xgboost.
Xgboost 0.82 // Python Package Index (PyPI). — Updated 2019. — Available at: https://pypi.org/project/xgboost/.
Friedman J.H. Greedy Function Approximation: A Gradient Boosting Machine / J.H. Friedman // Reitz Lecture. — 1999.
Hastie T. 10. Boosting and Additive Trees / T. Hastie, R. Tibshirani, J.H. Friedman // The Elements of Statistical Learning. — 2009. — N 2. — P. 337–384.
XGBoost (eXtreme Gradient Boosting) // Distributed (Deep) Machine Learning Community. — Updated 2016. — Available at: https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions.
Kickstarter projects // Kaggle. — Updated 2018. — Available at: https://www. kaggle.com/kemical/kickstarter-projects/version/3#ks-projects-201801.csv
Kuznietsova N.V. Information Technologies for Clients’ Database Analysis and Behaviour Forecasting / N.V. Kuznietsova // Selected Papers of the XVII International Scientific and Practical Conference on Information Technologies and Security (ITS 2017). — 2017. — P. 56–62. — Available at: http://ceur-ws.org/Vol-2067.
Allison P.D. Survival Analysis Using SAS / P.D. Allison // Cary. — 2010. — 324 p.
Cox D.R. Regression Models and Life-Tables / D.R. Cox // Journal of the Royal Statistical Society, Series B. — 1972. — Vol. 34, N 2. — P. 187–220.
Kickstarter // PBC. — Updated 2019. — Available at: https://www.kickstarter.com/.
Sikorsky Challenge. — Updated 2019. — Available at: https://www. sikorskychallenge.com/.