History, statistics, data, and a call for graphing help
So one benefit to saving all of your Tweets is generating statistics for trains and their configurations over time. This data may be useful, for example, for demonstrating to Caltrain that they are not consistently sending high capacity trains during peak times.
While there were only a few tweets that were unrecognizable by my parser everyone should take a look at the Updating Guide for formatting suggestions. I’m currently matching on /(1|one)|(2|two)/i and /(old|gallery)|(new|bombardier)/i so as long as you have that in there we should be good.
One problem I ran into is graphing the data in a meaningful way. I looked at gnuplot, but damn if I could figure it out. Granted I spent 5 minutes looking at it, but if I couldn’t get it to work in that time I didn’t want to waste any more on it since I wasn’t sure if anyone would even get value from it. I took a look at Excel and it kinda did what I wanted, but I use Excel about once a year and I just create trivial sheets.
So here is what I am hoping someone can help me with — graphing this damn data. Right now I have a list in the following format, but I can save it in pretty much any way.
{DATE} {TRAIN} {NUMBER OF SLOTS}
The X axis would be DATE, Y would be NUMBER OF SLOTS, and each TRAIN would be a separate series.
Any help?
September 9th, 2008 at 1:38 pm
You’re a genius with this graphing idea. Sorry I can’t help with the specifics.
Does my backwards “329 NB …” (train number then direction) screw up your pattern matching? Should I switch it around to the preferred format as outlined in your Updating guide?
September 9th, 2008 at 1:40 pm
Fritz: Your formatting does not negatively impact my parsing.
September 11th, 2008 at 7:43 am
I say bite the bullet and go with gnuplot. It makes probably the best plots with the least adjustment. Xmgrace is an alternate, but you need to do a lot of manual tweaking, and it’s autoscale is poor. I could help you with xmgrace, but with gnuplot, I’m also a beginner.
September 17th, 2008 at 11:32 am
Consider using the Google charts API!
October 26th, 2008 at 12:40 pm
Highly recommend xgobi over gnuplot. Very intuitive, and lots of options for displaying multi-axis or multi-series data.
October 30th, 2008 at 11:01 am
I’m running a similar data gathering effort but so far have only worried about getting columar data out.
With regards to the graphing, do you want to produce on demand web refreshable charts, or static files?
My workflow would likely start with excel to model what data and what kind of graph / chart to go with.
Colleagues at work are quite happy with an open source C# / .NET library called zedgraph.