Community supported Caltrain notices, one tweet at a time

History, statistics, data, and a call for graphing help

So one benefit to saving all of your Tweets is generating statistics for trains and their configurations over time.  This data may be useful, for example, for demonstrating to Caltrain that they are not consistently sending high capacity trains during peak times.

While there were only a few tweets that were unrecognizable by my parser everyone should take a look at the Updating Guide for formatting suggestions.  I’m currently matching on /(1|one)|(2|two)/i and /(old|gallery)|(new|bombardier)/i so as long as you have that in there we should be good.

One problem I ran into is graphing the data in a meaningful way.  I looked at gnuplot, but damn if I could figure it out.  Granted I spent 5 minutes looking at it, but if I couldn’t get it to work in that time I didn’t want to waste any more on it since I wasn’t sure if anyone would even get value from it.  I took a look at Excel and it kinda did what I wanted, but I use Excel about once a year and I just create trivial sheets.

So here is what I am hoping someone can help me with — graphing this damn data.  Right now I have a list in the following format, but I can save it in pretty much any way.

{DATE} {TRAIN} {NUMBER OF SLOTS}

The X axis would be DATE, Y would be NUMBER OF SLOTS, and each TRAIN would be a separate series.

Any help?

6 Responses to “History, statistics, data, and a call for graphing help”

  1. You’re a genius with this graphing idea. Sorry I can’t help with the specifics.

    Does my backwards “329 NB …” (train number then direction) screw up your pattern matching? Should I switch it around to the preferred format as outlined in your Updating guide?

  2. Fritz: Your formatting does not negatively impact my parsing.

  3. I say bite the bullet and go with gnuplot. It makes probably the best plots with the least adjustment. Xmgrace is an alternate, but you need to do a lot of manual tweaking, and it’s autoscale is poor. I could help you with xmgrace, but with gnuplot, I’m also a beginner.

  4. Consider using the Google charts API!

  5. Highly recommend xgobi over gnuplot. Very intuitive, and lots of options for displaying multi-axis or multi-series data.

  6. I’m running a similar data gathering effort but so far have only worried about getting columar data out.

    With regards to the graphing, do you want to produce on demand web refreshable charts, or static files?

    My workflow would likely start with excel to model what data and what kind of graph / chart to go with.

    Colleagues at work are quite happy with an open source C# / .NET library called zedgraph.

Leave a Reply