Tuesday 3 April 2012

Calculate years between two dates with Perl

Problem:
Giving two dates calculate the number of years between them.

Preamble:
This seems a trivial problem BUT it is not: you could take the year of each date subtract them and if > 0, then if the month of the second date is smaller than the first, add -1, and if it is equal, check the day. What could possibly go wrong?

First and more important, there are zillions of date formats and some very difficult to parse with RE. Then comes the problem that you would like to see the difference in years forward... and backwards!! and probably you would like to show the years with decimal point, so you need to count days .... and so on, and at some point your script would need to report days and months, that is life!.

Solution:
This is a task for the industry grade CPAN's DateTime family modules in Perl.

DateTime::Format::xxx is a family of modules specialised in parsing and formatting any date format you can imagine. Then you can calculate things with Date::Calc or DateTime::Duration.

#!/usr/bin/env perl


=head1 [progam_name]

 description: Calculate years between dates (two dates or one date and current time)

=head2 First version

  a file with three columns (id, start date, end date)

  - parse the date

      | strptime($strptime_pattern, $string)
      | Given a pattern and a string this function will return a new DateTime object.
      | %F
      | Equivalent to %Y-%m-%d. (This is the ISO style date)


  - check that first date < last date

  - output years as a new column

=cut


use feature ':5.10';
use strict;
use warnings;
use Getopt::Long;
use DateTime::Format::Strptime;
use File::Slurp;


my $prog = $0;
my $usage = <<EOQ;
Usage for $0:

  $prog [-test -help -verbose] file_with_dates_in_column_2_and_3

EOQ

my $has_header =1;

my $file_status;

my $file = shift;


my $fmt = DateTime::Format::Strptime->new(
    pattern => '%F',
    locale  => 'en_US',
);

# take care of windows end of line in a linux machine (need both chomp and s/\r$//)
my $dates = [map{chomp;s/\r$//;[split /\t/]} read_file($file)];

my $header = shift @$dates if $has_header;

my $line;
foreach my $date_aoa (@$dates) {

    # get the DateTime objects
    my $start = $fmt->parse_datetime($date_aoa->[1]);
    my $end   = $fmt->parse_datetime($date_aoa->[2]);

    unless ($start || $end){
        die "Error parising id '$date_aoa->[0]' at  line $line\n"
    }
    # get a DateTime::Duration object (that is automatic when doing math with DateTime objects)
    my $dur = $end - $start;

    $date_aoa->[3] = $dur->years;
    $line ++;
}

print_result($dates);


sub print_result {

    my ($dates) = @_;

    say join("\n", map{join("\t",@$_ )}@$dates);

}



Some Links:

The question http://stackoverflow.com/questions/6549522/how-to-make-datetimeduration-output-only-in-days and its answer

http://stackoverflow.com/a/6550372/427129 are also interesting

http://stackoverflow.com/questions/821423/how-can-i-calculate-the-number-of-days-between-two-dates-in-perl

http://stackoverflow.com/questions/3055422/calculating-a-delta-of-years-from-a-date

http://stackoverflow.com/questions/8308655/how-to-find-the-difference-between-two-dates-in-perl

http://stackoverflow.com/questions/3910858/using-perl-how-do-i-compare-dates-in-the-form-of-yyyy-mm-dd


http://datetime.perl.org/wiki/datetime/page/FAQ%3A_Date_Time_Formats#How_Do_I_Convert_between_Date::Manip_and_DateTime_Objects_-6


The Many Dates and Times of Perl

2 comments:

LeoNerd said...

How about simply strptime/mktime them both, subtract, divide by 60*60*24*365.

The trouble with your "how many years between" question is it's slightly ill-defined anyway, with respect to the definition of a year. How long in fractional years between January 1st 2010 and January 1st 2011? How long between the same in 2012 and 2013 - I'll remind you 2012 was a leap year and 2010 was not.

Do we take a year as 365 days, 365.26 days, some other value? Do we count the number of New Years Eve/New Years Day boundaries we cross between the two dates?

Pablo Marin-Garcia said...

@LeoNerd, If you have zillions of dates to process or you don't mind to equate all years to 60*60*24*365 secs, using strptime/mktime is ok.

But just in case you need a bit more of versatility and you can afford to use some extra modules and some extra function calls then using DateTime is a good tool to use. Do calculations between dates is very tricky so I always suggest to use DateTime as a first option BUT if someone knows what he is doing and have good reasons not to use DateTime then he must not use it.