- Question ID
- Creation Date
- Closed Date (when applicable)
- Deletion Date (when applicable)
- Owner user ID
- Number of answers
As David Robinson explains in his introductory post, the Stacklite dataset is designed to be easy to read and analysed with any programming language or statistical tool. A fantastic resource if you are a data analyst/scientist and want to crunch some real data!
I thought to give it a go and perform some exploratory analysis using R. More specifically, I am going to answer the following business questions:
- What are the most popular tags?
- How many questions have more than one tag?
- What is the overall closure rate for the site and which tags present higher values?
- How much time it takes, on average, to close a question?
- Which tags tend to have higher/lower score?
- And in particular: how data science languages perform on the above questions?