******************************************************************************************.
*This macro produces estimates for aoristocratic analysis given Start and End date time value (returns a seperate dataset named "aor" of estimates).
*Original citation 
*Ratcliffe, J. H. (2002). Aoristic signatures and the Spatio-Temporal analysis of high volume crime patterns. Journal of Quantitative Criminology, pages 23-43.
*http://dx.doi.org/10.1023/A:1013240828824
*online PDF - http://jratcliffe.net/papers/Ratcliffe%20(2002)%20Aoristic%20signatures.pdf
*******************.
*See end of post for example use and consistency check against online app.
*Written by Andrew Wheeler, any questions/comments for feel free to send me an email at apwheele@gmail.com.
*Code could always be improved! - dont hesistate.
******************************************************************************************.


******************************************************************************************.
*PARAMETER DESCRIPTIONS.

*REQUIRED PARAMETERS.
*DateTimeStart = ???? / beginning datetime of incident / needs to be in SPSS datetime format.
*DateTimeEnd  = ???? / end date of incident / ditto DateTimeStart
*Interval = ?????? / type of summary interval / defaults to HourDay.

*Interval Types.
*HourDay - Hours across one day.
*MinWeek - 15 minute intervals across week.
*HourWeek - Hours across week.
*MinDay - 15 minutes intervals across week.
*Week - Day intervals across week.
*NOTE: The way I conduct the estimates is slightly different than previously published, if the interval is GREATER THAN the maximum range of the
intervals (either a day or a week) then I just split the interval evenly across all all the ranges, this is because the beginning/end times say more
about the nature of victim activities than they do about when the event was perpetrated.

*OPTIONAL.
*Split = ???? / can be any variable(s) you want to split the results by / defaults to nothing and results are returned stacked (ie long format).
*Data = "Y" / this will return the reshaped file that is the original cases in long format before aggregating to the time intervals, otherwise dataset is deleted.
******************************************************************************************.


******************************************************************************************.
*SOME EXAMPLE COMMANDS ARE [need to replace * with ! to run].

*15 minunte bins across day.
*aoristic DateTimeStart = ???? DateTimeEnd  = ???? Interval = MinDay split = ????.
*1 hour bins across day.
*aoristic DateTimeStart = ???? DateTimeEnd  = ???? Interval = HourDay.
*Or can just use the default.
*aoristic DateTimeStart = ???? DateTimeEnd  = ????.

*NOTE: MinWeek and HourWeek are much more computationally intensive, beware of timing for very large
datasets.
******************************************************************************************.


******************************************************************************************.
*FUNCTION BELOW.

DEFINE !aoristic (DateTimeStart = !TOKENS(1)
/DateTimeEnd = !TOKENS(1)
/Interval = !DEFAULT("HourDay") !TOKENS(1)
/Split = !DEFAULT ("") !TOKENS(1)
/Data = !DEFAULT ("N") !TOKENS(1)
).

*Prepping Data.
dataset copy XX_timetemp.
dataset activate XX_timetemp.
rename variables (!DateTimeStart = begin_date)
(!DateTimeEnd = end_date).
formats begin_date end_date (DATETIME17).
match files file = *
/keep begin_date end_date !split.
*getting rid of invalid dates.
compute #mis_beg = MISSING(begin_date).
compute #mis_end = MISSING(end_date).
compute diff = (end_date - begin_date)/60.
select if #mis_beg = 0 and #mis_end = 0 and diff >= 0.

*Defining how many bins for each of the interval types.
*I treat default interval as 'HourDay'.
!IF (!interval = "HourDay") !THEN
!LET !bins = 24
!LET !int = 1
!LET !mod = 60
!IFEND
!IF (!interval = "MinWeek") !THEN
!LET !bins = 672
!LET !int = 2
!LET !mod = 15
!IFEND
!IF (!interval = "HourWeek") !THEN
!LET !bins = 168
!LET !int = 2
!LET !mod = 60
!IFEND
!IF (!interval = "MinDay") !THEN
!LET !bins = 96
!LET !int = 1
!LET !mod = 15
!IFEND
!IF (!interval = "Week") !THEN
!LET !bins = 7.
!LET !int = 3.
!LET !mod = 1440.
!IFEND

*Defining wether event wraps around and begin/end times plus mod.
*This ends up being conditional on the different sets of interval types.
*I defined in above if statements if the interval is weekly int = 2, if.
*daily int = 1.
compute timeB = XDATE.TIME(begin_date)/60.
compute timeE = XDATE.TIME(end_date)/60.
compute wkB = XDATE.WKDAY(begin_date) - 1.
compute wkE = XDATE.WKDAY(end_date) - 1.
!IF (!int = 1) !THEN
DO IF diff = 0.
    compute wrap_around = 0.
ELSE IF diff < (24*60) and timeB < timeE.
    compute wrap_around = 0.
ELSE IF diff < (24*60) and timeB >= timeE.
    compute wrap_around = 1.
ELSE IF diff > (24*60).
    compute wrap_around = 2.
END IF.
!IFEND
!IF (!int = 2 !OR !int = 3) !THEN
DO IF diff = 0.
    compute wrap_around = 0.
ELSE IF diff < (7*24*60) and wkB < wkE.
    compute wrap_around = 0.
ELSE IF diff < (7*24*60) and wkB = wkE and timeB < timeE.
    compute wrap_around = 0.
ELSE IF diff < (7*24*60) and wkB = wkE and timeB >= timeE.
   compute wrap_around = 1.
ELSE IF diff < (7*24*60) and wkB > wkE.
    compute wrap_around = 1.
ELSE IF diff > (7*24*60).
    compute wrap_around = 2.
END IF.
!IFEND

*Creating new_begin and val_modB, base is based on interval type.
!IF (!int = 1) !THEN
compute #baseB = timeB/!mod.
compute #baseE = timeE/!mod.
!IFEND
!IF (!int = 2 !OR !int = 3) !THEN
compute #baseB = (wkB*(60*24) + timeB)/!mod.
compute #baseE = (wkE*(60*24) + timeE)/!mod.
!IFEND
compute new_begin = TRUNC(#baseB) + 1.
compute val_modB = (#baseB - new_begin) + 1.
compute new_end = TRUNC(#baseE) + 1.
compute val_modE = (#baseE - new_end) + 1.

*Computing vector, based on bins.
vector timesplit(!bins).
do if wrap_around = 0.
loop #i = new_begin to new_end.
    compute timesplit(#i) = 1.
    if #i = new_begin and diff > 0 timesplit(#i) = (1 - val_modB).
    if #i = new_end and diff > 0 timesplit(#i) = val_modE.
end loop.
else if wrap_around = 1.
loop #i = new_begin to !bins.
    compute timesplit(#i) = 1.
    if #i = new_begin timesplit(#i) = (1 - val_modB).
end loop.
loop #j = 1 to new_end.
    compute timesplit(#j) = 1.
    if #j = new_end timesplit(#j) = val_modE.
end loop.
else if wrap_around = 2.
loop #i = 1 to !bins.
    compute timesplit(#i) = 1/!bins.
end loop.
end if.
!LET !timesplitE = !CONCAT('timesplit',!bins)
compute total_time = SUM(timesplit1 to !timesplitE).
*renormalizing timesplit, and puts in a fake zero row to make the final dataset be filled out.
compute const = 1.
!IF (!split <> !NULL) !THEN
sort cases by const !split.
!IFEND
match files file = *
/first = fill
/by const !split.
vector timesplit = timesplit1 to !timesplitE.
loop #i = 1 to !bins.
compute timesplit(#i) = timesplit(#i)/total_time.
if fill = 1 and MISSING(timesplit(#i)) = 1 timesplit(#i) = 0.
end loop.

*Reshaping VARSTOCASES.
varstocases
/make aor_est from timesplit1 to !timesplitE
/index time_per
/drop new_begin new_end val_modB val_modE wrap_around total_time timeB timeE wkB wkE diff.

*Aggregating to time intervals.
DATASET DECLARE aor.
AGGREGATE
  /OUTFILE='aor'
  /BREAK=time_per !split
  /aor_est = SUM(aor_est).
*Conditional statement whether to keep or get rid of.
*reshaped data, defaults to get rid of.
!IF (!Data <> 'Y') !THEN 
DATASET ACTIVATE aor.
dataset close XX_timetemp.
!ELSE
DATASET ACTIVATE XX_timetemp.
DATASET NAME reshapedLong.
select if aor_est > 0.
DATASET ACTIVATE aor.
!IFEND.


*Should consider making base datasets and then matching, so the end result has the full
amount of potential options, this won't work for the split variables though.

*Now need to re-figure out what exact time period these belong to.
*This will be conditional on the interval type.
compute time_per = time_per - 1.
!IF (!interval = 'HourDay') !THEN
compute Hour = Time.HMS(time_per).
!IFEND
!IF (!interval = 'MinWeek') !THEN
compute Week = TRUNC(time_per/96) + 1.
compute Minute = MOD(time_per,96)*15*60.
!IFEND
!IF (!interval = 'HourWeek') !THEN
compute Week = TRUNC(time_per/24) + 1.
compute Hour = Time.HMS(MOD(time_per,24)).
!IFEND
!IF (!interval = 'MinDay') !THEN
compute Minute = time_per*15*60.
!IFEND
!IF (!interval = 'Week') !THEN
compute Week = time_per + 1.
!IFEND
*Value labels and formats.
!IF (!interval = 'HourDay' !OR !interval = 'HourWeek') !THEN
formats Hour (TIME5).
variable label Hour 'Hour of Day'. 
!IFEND
!IF (!interval = 'MinDay' !OR !interval = 'MinWeek') !THEN
formats Minute (TIME5).
variable label Minute 'Minute of Day (15 Minute Bins)'. 
!IFEND
!IF (!interval = 'Week' !OR !interval = 'MinWeek' !OR !interval = 'HourWeek') !THEN
value labels Week
1 'Sunday'
2 'Monday'
3 'Tuesday'
4 'Wednesday'
5 'Thursday'
6 'Friday'
7 'Saturday'.
variable label Week 'Day of Week'.
!IFEND
*May consider putting this is seperate MACRO, then calling for both reshaped dataset if kept.
*And aggregated dataset, better to keep seperate, as it will be additional computation on a large
*reshaped dataset.

*Estimating error intervals (Agresti-Coull - 95% Confidence Interval for mean of proportion).
*NOTE: I very much doubt these estimates have proper coverage, and the intervals will likely.
*be too small, still they are useful to show variability in the estimates (especially for small.
*samples).
*Aggregating to get total number of cases.
dataset activate aor.
compute const = 1.
AGGREGATE
  /OUTFILE=*
  MODE=ADDVARIABLES
  /BREAK=const !split
  /Total_Case = SUM(aor_est).
compute per_aor = aor_est / Total_Case.
*Error Intervals.
compute #z = 1.96.
compute #p_adj = (aor_est + .5*#z)/(Total_Case + #z**2).
compute #inside = #p_adj * (1 - #p_adj).
compute low_int = #p_adj - #z*sqrt(#inside/Total_Case).
compute high_int = #p_adj + #z*sqrt(#inside/Total_Case).
if low_int < 0 low_int = 0.
if high_int > 1 high_int = 1.
*Getting rid of extraneous variables.
match files file = *
/drop const time_per.
!ENDDEFINE.
******************************************************************************************.

******************************************************************************************.
*SOME EXAMPLE USES.

*Example dataset applications - would need to uncomment to run.
*Please check http://aoristic.policeanalyst.com/#results for consistency, should yield the same results when no cases have 
interval over potential time period, so below will give same answers for week long routines, but not for day long routines.

*dataset close ALL.
*output close ALL.

*data list free / cat (F1.0) begin_date2 (ADATE10) begin_time2 (TIME5) end_date2 (ADATE10) end_time2 (TIME5).
*begin data
1 10/01/2012 00:15 10/01/2012 00:15
2 10/01/2012 00:15 10/01/2012 08:30
1 10/01/2012 01:45 10/01/2012 17:40
2 10/01/2012 20:30 10/02/2012 03:00
1 10/01/2012 20:30 10/03/2012 03:00
2 10/01/2012 03:00 10/01/2012 04:00
1 10/01/2012 03:00 10/02/2012 00:00
2 10/01/2012 03:00 10/02/2012 24:00
*end data.
*dataset name times.
*compute begin_full = begin_date2 + begin_time2.
*compute end_full = end_date2 + end_time2.
*formats begin_full end_full (DATETIME17).
*set mprint off.
*set mprint on.

*Hours during day.
*aoristic DateTimeStart = begin_full DateTimeEnd  = end_full Interval = HourDay.
*aoristic DateTimeStart = begin_full DateTimeEnd  = end_full Interval = HourDay Split = cat Data = Y.


*15 minutes bins across week.
*aoristic DateTimeStart = begin_full DateTimeEnd  = end_full Interval = MinWeek.
*aoristic DateTimeStart = begin_full DateTimeEnd  = end_full Interval = MinWeek Split = cat Data = Y.

*Week.
*aoristic DateTimeStart = begin_full DateTimeEnd  = end_full Interval = Week.

*1 hour bins across week.
*aoristic DateTimeStart = begin_full DateTimeEnd  = end_full Interval = HourWeek.

*checking results.
*sort cases by hour.
*split file by hour.
*freq var aor_est /format = notables /statistics = sum.
*split file off.

*Please check http://aoristic.policeanalyst.com/#results for consistency, should yield the same results when no cases are 
split over potential time period, so above will give same answers for weeks and hours for, week long routines, but not for day long routines.

*Below data should be aggreeble to online calculator with day long routines for minutes and hour of day.
*dataset close ALL.
*output close ALL.

*data list free / cat (F1.0) begin_date2 (ADATE10) begin_time2 (TIME5) end_date2 (ADATE10) end_time2 (TIME5).
*begin data
1 10/01/2012 00:15 10/01/2012 00:15
2 10/01/2012 00:15 10/01/2012 08:30
1 10/01/2012 01:45 10/01/2012 17:40
2 10/01/2012 20:30 10/02/2012 03:00
1 10/01/2012 20:30 10/02/2012 03:00
2 10/01/2012 03:00 10/01/2012 04:00
1 10/01/2012 03:00 10/02/2012 00:00
2 10/01/2012 03:00 10/01/2012 24:00
*end data.
*dataset name times.

*set mprint on.
*set mprint off.

*Hours during day.
*aoristic DateTimeStart = begin_full DateTimeEnd  = end_full.

*15 Minutes across day.
*aoristic DateTimeStart = begin_full DateTimeEnd  = end_full Interval = MinDay.
*compute Hour = Time.HMS(Minute).
*split file by Hour.
*freq var aor_est /format = notable /statistics = sum.
*split file off.