About Dataset

This dataset was created during the Programming Language Ecosystem project from TU Wien using the code inside the repository https://github.com/ValentinFutterer/UsageOfProgramminglanguages2011-2023?tab=readme-ov-file.

The centerpiece of this repository is the usage_of_programming_languages_2011-2023.csv. This csv file shows the popularity of programming languages over the last 12 years in yearly increments. The repository also contains graphs created with the dataset. To get an accurate estimate on the popularity of programming languages, this dataset was created using 3 vastly different sources.

About Data collection methodology

The dataset was created using the github repository above. As input data, three public datasets where used.

-github_metadata Taken from https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/ by Peter Elmers. It is licensed under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/. It shows metadata information (no code) of all github repositories with more than 5 stars.

-PYPL_survey_2004-2023 Taken from https://github.com/pypl/pypl.github.io/tree/master, put online by the user pcarbonn. It is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/. It shows from 2004 to 2023 for each month the share of programming related google searches per language.

-stack_overflow_developer_survey Taken from https://insights.stackoverflow.com/survey. It is licensed under Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/. It shows from 2011 to 2023 the results of the yearly stackoverflow developer survey.

All these datasets were downloaded on the 12.12.2023. The datasets are all in the github repository above.

Description of the data

The dataset contains a column for the year and then many columns for the different languages, denoting their usage in percent. Every language mentioned in the three datasets is going to be present in this csv file. Additionally, vertical barcharts and piecharts for each year plus a line graph for each language over the whole timespan as png's are provided.

License

This project is licensed under the Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/ license.

TLDR: You are free to share, adapt, and create derivative works from this dataser as long as you attribute me, keep the database open (if you redistribute it), and continue to share-alike any adapted database under the ODbl.

Acknowledgments

Thanks go out to

  • stackoverflow https://insights.stackoverflow.com/survey for providing the data from the yearly stackoverflow developer survey.
  • the PYPL survey, https://github.com/pypl/pypl.github.io/tree/master for providing google search data.
  • Peter Elmers, for crawling metadata on github repositories and providing the data https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/.