How to write good statistical report for free (and make your boss happy).

During last years I was working in a multinational company in a position between logistic/warehouse support and IT, so data data data and more data to analyze and covert in useful KPIs/charts.

In the same time I was following the Coursera Data Science specialization course, a 9 modules long specialization full of useful informations about statistics, Data Science, Machine Learning and more, created by Johns Hopkins University. I always was curiose about R language and in this course I used in deep with the help of the R Studio environment.

This document want be a fast tutorial on what you need to implement the platform and be productive in hours/days and it’s based on my personal experience, if you have some suggestions please write a comment!

Software and hardware

In the first step you can use your personal computer with R Studio desktop, a very useful IDE spacial designed for R and statistical elaborations. If you want install it you will need:

If you can access a share folder it will be useful for easily share scripts and results.

If you have space on a intra-factory server and you can install applications maybe it’s time to try with R Studio server, in fact is the same than installed one but run on server and you can access using your browser 🙂 Not bad at all 🙂

The developing process

In order to develop complex report and make it work I used this steps that allow

Create the scripts

The first step is to download and imports data from the various informative system and convert in a useful format for R, in my case I needed to elaborate data from:

  • Csv files download from some shared folders
  • HTML tables provided from some PHP pages
  • Data results from some remote MySQL databases

R can easily manage all this and mutch more with the libraries provide by the CRAN archive.

The main idea in this step is to realize various script (maybe one for system or for data kind) able to perform this steps:

  • download the data needed,
  • convert the data in a data.frame and adjust the format of the variables (eg. timestamp has to be converted in a native time format),
  • clean, convert and reformat and data that need it (eg. covert ),
  • summarize the dataset if needed (eg. calculate the sum by month),
  • save all the dataset in the folder “data”, this will allow the report to charge data from a common place.

Create the reports

The output has to be one file only and if needed images can be extracted, we can use the Knitr package for generate some HTML reports.

The report is a document made by some text in markdown format and some pieces of code (called Chunks) that allow to print formatted data, tables, charts etc. Rmarkdown allow us to pass some arguments to the report (eg. use the same report for visualize warehouse movement between two dates passed as argument).

The report can have a structure similar to the following skeleton:

In witch we can recognize:

  • the code form load the datasets
  • the code for elaborate the data loaded
  • an introduction/executive summary
  • the text with:
    • charts generated with the ggplot2 package
    • tables generated with the xtable package
    • other information generated by the scripts

 

Automagically make it work

When the script is ready we usually want execute it automatically in order to generate our report and save in a shared folder. Maybe we can iterate the same approach on last 7 days document in order to update all the documents in a single shot, for make it we can use the following code.

We will need only to call rscript with this script as argument in cron for automagically have our report (maybe one hour before we reach workplace :))and make your boss very happy and your life easier!

Email this to someoneShare on FacebookShare on Google+Share on LinkedInTweet about this on TwitterPin on PinterestPrint this page

Leave a Reply

Your email address will not be published. Required fields are marked *