Achilles Timeline Statistics

These statistics are based on the data at http://achillesblog.com/atrpt.php copied into a spreadsheet 6/9/14. A cleaner solution would be to use the original data. Initially read most data as strings because fields need different conversions.

Had some trouble with ÿ (character 0xFF, from -1 for EOF?) appearing before days in CSV status fields. Easiest fix was a global replace in the CSV file.

I wish the data included op/non-op so we could break the statistics down by that.
Would also be useful to have rerupture data, but unlikely that would be reliable.

library(lattice)  # For densityplot
## Warning: package 'lattice' was built under R version 3.0.3

timeline <- read.csv("C:/Users/rseiter/Documents/Health/Achilles/Statistics/AchillesTimeline.csv", 
    stringsAsFactors = FALSE)

Data Conversions

Convert data to more usable formats. The hardest part is converting days fields into actual days (original data would be easier).

# First save the original data to allow debugging of conversions
timelineOrig <- timeline

timeline$Name <- as.factor(timeline$Name)
timeline$Leg <- as.factor(timeline$Leg)
timeline$Gender <- as.factor(timeline$Gender)
timeline$Date.of.Injury <- as.Date(timeline$Date.of.Injury, format = "%m/%d/%Y")
timeline$Date.of.Surgery <- as.Date(timeline$Date.of.Surgery, format = "%m/%d/%Y")

summary(timeline)
##              Name        Location         Leg      Gender  
##  123tech       :   1   Length:2303         :   7   F: 747  
##  19yrold       :   1   Class :character   B:  31   M:1556  
##  1greeneye     :   1   Mode  :character   L:1143           
##  1shann        :   1                      R:1122           
##  1shutinathlete:   1                                       
##  1tufchick     :   1                                       
##  (Other)       :2297                                       
##  Age.when.Injured Activity.When.Injured Date.of.Injury      
##  Min.   : 1.0     Length:2303           Min.   :0208-09-02  
##  1st Qu.:31.0     Class :character      1st Qu.:2009-09-01  
##  Median :38.0     Mode  :character      Median :2011-05-29  
##  Mean   :37.8                           Mean   :2009-07-03  
##  3rd Qu.:44.0                           3rd Qu.:2012-09-19  
##  Max.   :74.0                           Max.   :2103-07-20  
##  NA's   :426                            NA's   :98          
##  Days.to.PWB        Days.to.FWB        Days.to.2.Shoes   
##  Length:2303        Length:2303        Length:2303       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  Days.to.Phys.Ther  Date.of.Surgery     
##  Length:2303        Min.   :2000-06-11  
##  Class :character   1st Qu.:2009-10-17  
##  Mode  :character   Median :2011-07-02  
##                     Mean   :2011-05-10  
##                     3rd Qu.:2012-11-07  
##                     Max.   :2014-07-04  
##                     NA's   :24

# Year by itself is useful Date.of.Surgery seems like most reliable field to
# use to calculate Year Primarily because it is baseline for status changes
timeline$Year <- as.numeric(format(timeline$Date.of.Surgery, "%Y"))

# Cumbersome but works...
stringToDays <- function(str) {
    valid = FALSE
    result = 0
    days <- suppressWarnings(as.numeric(sub(".*([0-9])\\s*day.*", "\\1", str, 
        perl = TRUE)))
    if (!is.na(days) && days > 0) {
        valid = TRUE
        result = result + days
    }
    weeks <- suppressWarnings(as.numeric(sub("[^0-9]*([0-9]+)\\s*wk.*", "\\1", 
        str, perl = TRUE)))
    if (!is.na(weeks) && weeks > 0) {
        valid = TRUE
        result = result + 7 * weeks
    }

    if (valid) 
        result else NA
}
# stringToDays('10 wks 1 day')

# How to vectorize this?
timeline$PWB <- sapply(timeline$Days.to.PWB, stringToDays)
timeline$FWB <- sapply(timeline$Days.to.FWB, stringToDays)
timeline$TwoShoes <- sapply(timeline$Days.to.2.Shoes, stringToDays)
timeline$PT <- sapply(timeline$Days.to.Phys.Ther, stringToDays)

summary(timeline[, c("PWB", "FWB", "TwoShoes", "PT")])
##       PWB             FWB           TwoShoes          PT      
##  Min.   :  1.0   Min.   :  1.0   Min.   :   1   Min.   :   1  
##  1st Qu.: 18.0   1st Qu.: 32.0   1st Qu.:  49   1st Qu.:  29  
##  Median : 29.0   Median : 42.0   Median :  61   Median :  44  
##  Mean   : 31.2   Mean   : 46.4   Mean   :  70   Mean   :  51  
##  3rd Qu.: 41.0   3rd Qu.: 55.0   3rd Qu.:  76   3rd Qu.:  58  
##  Max.   :368.0   Max.   :661.0   Max.   :3733   Max.   :3345  
##  NA's   :956     NA's   :1206    NA's   :1419   NA's   :1332

Data Manipulation

Modify data to improve presentation.

For example, surgery dates before 2007 can be removed to give more usable histograms. For now this type of change is done during viewing with only an example here.

cutoff <- as.Date("01/01/2007", format = "%m/%d/%Y")
timelineCopy <- subset(timeline, Date.of.Surgery > cutoff)
hist(timelineCopy$Date.of.Surgery, breaks = "months")

plot of chunk manipulate

Data Exploration

Initial looks at the data. This is used to guide conversion, manipulation, and final statistical work.

hist(timeline$Date.of.Injury, breaks = "months")

plot of chunk explore


# There are problematic injury dates, take a look
dateOrd <- order(timeline$Date.of.Injury)
head(data.frame(string = timelineOrig$Date.of.Injury, date = timeline$Date.of.Injury)[dateOrd, 
    ], 20)
##          string       date
## 2066 09/02/0208 0208-09-02
## 1259 11/01/1007 1007-11-01
## 1790 07/09/1009 1009-07-09
## 1655  10/1/1972 1972-10-01
## 1677  1/24/1987 1987-01-24
## 1151  6/24/1992 1992-06-24
## 2055   1/1/1993 1993-01-01
## 267   6/15/1994 1994-06-15
## 2279   3/5/1998 1998-03-05
## 147    7/5/1998 1998-07-05
## 65     7/1/2004 2004-07-01
## 675  12/23/2004 2004-12-23
## 1359  5/10/2005 2005-05-10
## 1133  10/1/2005 2005-10-01
## 1999 10/29/2005 2005-10-29
## 2278  4/19/2006 2006-04-19
## 717    6/1/2006 2006-06-01
## 1954   7/4/2006 2006-07-04
## 2277  7/13/2006 2006-07-13
## 2104   1/7/2007 2007-01-07
head(timeline[dateOrd, ], 10)
##               Name                         Location Leg Gender
## 2066  laminateddog Smithtown, New Brunswick, Canada   R      M
## 1259     cyndide74                                    L      F
## 1790       preet86                            Salem   R      F
## 1655          yeti         Canberra, ACT, AUSTRALIA   L      M
## 1677 littlefeeties                         St. Paul   L      F
## 1151       19yrold                      los angeles   R      M
## 2055   horsinround                      Northern WV   L      F
## 267    upstate2519                  Spartanburg, SC   R      F
## 2279     nicole123                                    R      F
## 147     indybucfan                                    R      M
##      Age.when.Injured            Activity.When.Injured Date.of.Injury
## 2066                1                       Taekwon-Do     0208-09-02
## 1259               13                  taking levaquin     1007-11-01
## 1790               NA walking drunk with high heels on     1009-07-09
## 1655               12                                      1972-10-01
## 1677               NA                                      1987-01-24
## 1151               NA                                      1992-06-24
## 2055               NA                 Horseback riding     1993-01-01
## 267                33                   hiking/walking     1994-06-15
## 2279               NA                                      1998-03-05
## 147                36                                      1998-07-05
##       Days.to.PWB  Days.to.FWB Days.to.2.Shoes Days.to.Phys.Ther
## 2066 5 wks 5 days                                          6 wks
## 1259                                                            
## 1790                                                            
## 1655                                                            
## 1677                                                            
## 1151        9 wks 8 wks 6 days           9 wks                  
## 2055        6 wks 6 wks 5 days    7 wks 3 days       8 wks 1 day
## 267   1 wk 5 days        5 wks    8 wks 5 days             5 wks
## 2279                                                            
## 147                                                             
##      Date.of.Surgery Year PWB FWB TwoShoes PT
## 2066      2008-09-10 2008  40  NA       NA 42
## 1259      2011-03-31 2011  NA  NA       NA NA
## 1790      2009-07-09 2009  NA  NA       NA NA
## 1655      2009-12-23 2009  NA  NA       NA NA
## 1677      2009-11-13 2009  NA  NA       NA NA
## 1151      2011-06-27 2011  63  62       63 NA
## 2055      2008-09-24 2008  42  47       52 57
## 267       2013-07-31 2013  12  35       61 35
## 2279      2000-06-11 2000  NA  NA       NA NA
## 147       2013-12-26 2013  NA  NA       NA NA

hist(timeline$Date.of.Surgery, breaks = "months")

plot of chunk explore

# Below is probably easier to do with lubridate
# http://www.r-statistics.com/2012/03/do-more-with-dates-and-times-in-r-with-lubridate-1-1-0/
# hist(as.numeric(format(timelineCopy$Date.of.Surgery, '%j'))) # day
hist(as.numeric(format(timelineCopy$Date.of.Surgery, "%W")), breaks = 52, xlab = "Surgery week", 
    main = "Histogram of week in year of surgery")  # week starting Monday

plot of chunk explore

Week of Injury

# Below is probably easier to do with lubridate
# http://www.r-statistics.com/2012/03/do-more-with-dates-and-times-in-r-with-lubridate-1-1-0/
hist(as.numeric(format(timeline$Date.of.Injury, "%W")), breaks = 52, xlab = "Injury week", 
    main = "Histogram of week in year of injury")  # week starting Monday

plot of chunk injuryWeek


hist(timeline$Date.of.Injury, breaks = "months", xlim = c(as.Date("01/01/2008", 
    format = "%m/%d/%Y"), Sys.Date()), freq = TRUE, xlab = "", main = "Histogram by Date of Injury")

plot of chunk injuryWeek

Status Histograms

Look at histograms for each status. For example, days until PWB.

# Limit display of each status
maxPWB <- 17 * 7
maxFWB <- 20 * 7
maxTwoShoes <- 20 * 7
maxPT <- 20 * 7
hist(timeline$PWB[timeline$PWB < maxPWB], breaks = seq(0, maxPWB, 7), xlab = "Days", 
    main = "Histogram of Days Until PWB")

plot of chunk status

densityplot(timeline$PWB[timeline$PWB < maxPWB], xlab = "Days", main = "Density Plot of Days Until PWB")

plot of chunk status


hist(timeline$FWB[timeline$FWB < maxFWB], breaks = seq(0, maxFWB, 7), xlab = "Days", 
    main = "Histogram of Days Until FWB")

plot of chunk status

densityplot(timeline$FWB[timeline$FWB < maxFWB], xlab = "Days", main = "Density Plot of Days Until FWB")

plot of chunk status


hist(timeline$TwoShoes[timeline$TwoShoes < maxTwoShoes], breaks = seq(0, maxTwoShoes, 
    7), xlab = "Days", main = "Histogram of Days Until 2 Shoes")

plot of chunk status

densityplot(timeline$TwoShoes[timeline$TwoShoes < maxTwoShoes], xlab = "Days", 
    main = "Density Plot of Days Until 2 Shoes")

plot of chunk status


hist(timeline$PT[timeline$PT < maxPT], breaks = seq(0, maxPT, 7), xlab = "Days", 
    main = "Histogram of Days Until Physical Therapy")

plot of chunk status

densityplot(timeline$PT[timeline$PT < maxPT], xlab = "Days", main = "Density Plot of Days Until Physical Therapy")

plot of chunk status

Age Statistics

summary(timeline$Age.when.Injured)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     1.0    31.0    38.0    37.8    44.0    74.0     426

hist(timeline$Age.when.Injured)

plot of chunk age

densityplot(timeline$Age.when.Injured)

plot of chunk age

Change in Time to PWB by Year

With the advent of more rapid non-op protocols I suspect the time to milestones (especially PWB?) will have changed over the years. See if this is true.

See Exploratory Data Analysis week 2 for plotting examples.

# Limit display
maxPWB <- 17 * 7
maxFWB <- 20 * 7
maxTwoShoes <- 20 * 7
maxPT <- 20 * 7

# densityplot(~ PWB | Year, data=timeline[timeline$PWB < maxPWB &
# timeline$Year >= 2007,], auto.key=TRUE, # Does not help?
# layout=c(1,max(timeline$Year,na.rm=TRUE)-2006), xlab='Days', main='Density
# Plots of Days Until PWB by Year') bwplot(~ PWB | Year,
# data=timeline[timeline$PWB < maxPWB & timeline$Year >= 2007,],
# auto.key=TRUE, # Does not help?
# #layout=c(1,max(timeline$Year,na.rm=TRUE)-2006), xlab='Days',
# main='Density Plots of Days Until PWB by Year') # Alternatively, try
# ggplot faceted histogram library(ggplot2) qplot(PWB,
# data=timeline[timeline$PWB < maxPWB & timeline$Year >= 2007 &
# !is.na(timeline$PWB),], facets = Year ~ ., binwidth=7)

boxplot(PWB ~ Year, data = timeline, subset = timeline$PWB < maxPWB & timeline$Year >= 
    2007, xlab = "", ylab = "Days Until PWB", main = "Box Plots of Days Until PWB by Year")

plot of chunk timeToPWB


boxplot(FWB ~ Year, data = timeline, subset = timeline$FWB < maxFWB & timeline$Year >= 
    2007, xlab = "", ylab = "Days Until PWB", main = "Box Plots of Days Until FWB by Year")

plot of chunk timeToPWB


boxplot(TwoShoes ~ Year, data = timeline, subset = timeline$TwoShoes < maxTwoShoes & 
    timeline$Year >= 2007, xlab = "", ylab = "Days Until Two Shoes", main = "Box Plots of Days Until Two Shoes by Year")

plot of chunk timeToPWB


boxplot(PT ~ Year, data = timeline, subset = timeline$PT < maxPT & timeline$Year >= 
    2007, xlab = "", ylab = "Days Until PT", main = "Box Plots of Days Until Physical Therapy by Year")

plot of chunk timeToPWB


# Summary statistics by year
timelineSubset <- subset(timeline, timeline$Year >= 2007)
tapply(timelineSubset$PWB, timelineSubset$Year, summary)
## $`2008`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     1.0    22.0    32.0    33.6    42.0    87.0     104 
## 
## $`2009`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     1.0    20.0    31.0    32.0    41.5    89.0     113 
## 
## $`2010`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     2.0    21.0    31.0    33.6    42.0   102.0     139 
## 
## $`2011`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     1.0    16.0    27.0    29.8    39.0   114.0     147 
## 
## $`2012`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     1.0    17.5    28.0    31.9    41.5   368.0     180 
## 
## $`2013`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     3.0    16.0    27.0    28.9    38.0    91.0     172 
## 
## $`2014`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     1.0    15.0    21.0    25.1    29.0   181.0      75
tapply(timelineSubset$FWB, timelineSubset$Year, summary)
## $`2008`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     2.0    37.0    47.0    49.8    58.5   121.0     122 
## 
## $`2009`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       1      34      43      46      56     101     144 
## 
## $`2010`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     7.0    32.0    42.0    51.1    57.2   661.0     169 
## 
## $`2011`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     2.0    27.0    39.0    42.9    50.0   424.0     180 
## 
## $`2012`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     1.0    35.0    43.0    48.4    55.0   414.0     237 
## 
## $`2013`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     6.0    32.0    41.0    43.4    54.0   121.0     224 
## 
## $`2014`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     7.0    27.0    35.0    36.2    42.0    92.0     105
tapply(timelineSubset$TwoShoes, timelineSubset$Year, summary)
## $`2008`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    18.0    51.8    61.5    65.9    80.0   137.0     141 
## 
## $`2009`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     1.0    49.0    60.0    66.1    77.0   434.0     165 
## 
## $`2010`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##      16      48      57      63      73     235     194 
## 
## $`2011`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    14.0    47.0    57.0    62.3    71.0   284.0     238 
## 
## $`2012`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##      29      52      61     101      76    3730     279 
## 
## $`2013`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    21.0    54.0    63.0    65.6    77.0   114.0     259 
## 
## $`2014`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    32.0    49.0    58.0    61.7    71.0   122.0     117
tapply(timelineSubset$PT, timelineSubset$Year, summary)
## $`2008`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     7.0    34.0    50.0    50.7    63.0   193.0     148 
## 
## $`2009`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     1.0    32.8    44.5    48.6    60.2   420.0     160 
## 
## $`2010`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       2      32      48      72      58    3340     185 
## 
## $`2011`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     8.0    29.0    42.0    46.1    54.0   412.0     213 
## 
## $`2012`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     1.0    30.8    46.5    50.6    60.2   390.0     255 
## 
## $`2013`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     4.0    26.0    43.0    47.3    58.0   413.0     250 
## 
## $`2014`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     7.0    18.8    31.0    34.9    44.5    95.0      96

Time until PWB, FWB, and PT all show a decreasing trend by year, but time until Two Shoes is relatively constant.