Wednesday, February 6, 2013

Please talk to me

I don't mean that to sound pathetic. But I'm just wrapping up — I hope I'm wrapping up — two projects where I, the statistician and data guru, was not included in all the initial data needs meetings. Both projects are taking me, and everyone else, vastly more time than should be necessary.

Project #1. Subject data was initially extracted and heavily modified. Then in a separate request and with a separate contract, this data was matched to a secondary data source. No problem yet. Then additional subject data is requested. The number of subjects in this request differs from the original extract. Why? Could it be a change in the underlying data systems? Differing inclusion criteria? No one has the initial request, so it's not clear. Who has to dig through both files to discover the similarities and differences and come up with a hypothesis for the original request? Me.

Then additional data was requested of our secondary data source. The contract had expired. Sorry.

Project #2. I built an overall data tracking and management system for the project. I suggested a vendor we could work with to incorporate an interactive voice response (IVR) system to automate phone calls and collect information each week. I knew the vendor used an Asterisk-based IVR platform, and as a Ruby programmer, I knew I would be able to understand and potentially even extend the system.

Then the details of the data exchange were set without me. Our system would export data to the vendor, and receive the result data, in Excel spreadsheets. This system could have been automated, but hey, at least it's not my being wasted.

When asked to incorporate this new data, I discover that neither set of data includes the primary key that would allow us to link the IVR data to our original data. So someone has to spend stunningly unnecessary amounts of time after the fact to link data. The two minutes it would have taken for my input — “ Be sure to include the subject ID! ” — have become hours of work.

To summarize, include your data people early and often in your conversations. If you're talking about data — and if you're talking about your research plan, your metrics, your deliverables, you probably are — they won't see it as a waste of time. Surprising them later will waste their time.

<Sigh>, thanks for listening. Back to project #1.