mysql generate missing dates with previous value

Below is a mysql table which has sparse dates.

col dt_id  value
A1 2018-05-28 30
A1 2018-05-30 20
A1 2018-05-31 50
A1 2018-06-01 50
A1 2018-06-04 80
A1 2018-06-05 50

The output should be something like below where missing dates are populated along with the last value.

col dt_id  value
A1 2018-05-28 30
A1 2018-05-29 30
A1 2018-05-30 20
A1 2018-05-31 50
A1 2018-06-01 50
A1 2018-06-02 50
A1 2018-06-03 50
A1 2018-06-04 80
A1 2018-06-05 50

here the following were generated.

A1 2018-05-29 30
A1 2018-06-02 50
A1 2018-06-03 50

I know solutions with oracle using last_value() over (partition by.., but since this is mysql, its a bit tricky.

Here is what i've tried:

create a time table and populate with data:

CREATE TABLE `time_table` (date_id date not null);
create table ints ( i tinyint ); insert into ints values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); 

insert into time_table (date_id) select date('2016-09-01')+ interval a.i*10000 + b.i*1000 + c.i*100 + d.i*10 + e.i day 
from ints a 
join ints b 
join ints c 
join ints d 
join ints e 
where (a.i*10000 + b.i*1000 + c.i*100 + d.i*10 + e.i) <= 11322 order by 1;

select * from time_table limit 10;
+------------+
| date_id    |
+------------+
| 2018-09-22 |
| 2018-09-21 |
| 2018-09-20 |
| 2018-09-19 |
| 2018-09-18 |
| 2018-09-17 |
| 2018-09-16 |
| 2018-09-15 |
| 2018-09-14 |
| 2018-09-13 |
+------------+

Here is the data for the balance table:
Here is the data
+------+------------+-------+
| A1   | 2018-05-28 |    30 |
| A1   | 2018-05-30 |    20 |
| A1   | 2018-05-31 |    50 |
| A1   | 2018-06-01 |    50 |
| A1   | 2018-06-04 |    80 |
| A1   | 2018-06-05 |    50 |
| B1   | 2018-05-28 |    30 |
| B1   | 2018-05-30 |    20 |
| B1   | 2018-05-31 |    50 |
| B1   | 2018-06-01 |    50 |
| B1   | 2018-06-04 |    80 |
| B1   | 2018-06-05 |    50 |
| C1   | 2018-05-28 |    30 |
| C1   | 2018-05-30 |    20 |
| C1   | 2018-05-31 |    50 |
| C1   | 2018-06-01 |    50 |
| C1   | 2018-06-04 |    80 |
| C1   | 2018-06-05 |    50 |
| D1   | 2018-06-28 |    30 |
| D1   | 2018-07-02 |    20 |
| D1   | 2018-07-04 |    50 |
| D1   | 2018-07-08 |    80 |
| D1   | 2018-07-19 |    50 |
+------+------------+-------+


mysql> select b.id, ab.id, tt.`date_id` as cal_date, b.`mx` as ex_date, val
    -> from time_table tt
    -> inner join (select id, min(date_id) mi, max(date_id) mx from balance group by id) b
    -> on tt.`date_id` >= b.`mi`
    -> and tt.`date_id` <= b.mx
    -> left join (select id, date_id, sum(value) val from balance group by id, date_id) ab
    -> on ab.id = b.id and tt.`date_id` = ab.date_id
    -> order by cal_date;
+------+------+------------+------------+------+
| id   | id   | cal_date   | ex_date    | val  |
+------+------+------------+------------+------+
| A1   | A1   | 2018-05-28 | 2018-06-05 |   30 |
| A1   | NULL | 2018-05-29 | 2018-06-05 | NULL |
| A1   | A1   | 2018-05-30 | 2018-06-05 |   20 |
| A1   | A1   | 2018-05-31 | 2018-06-05 |   50 |
| A1   | A1   | 2018-06-01 | 2018-06-05 |   50 |
| A1   | NULL | 2018-06-02 | 2018-06-05 | NULL |
| A1   | NULL | 2018-06-03 | 2018-06-05 | NULL |
| A1   | A1   | 2018-06-04 | 2018-06-05 |   80 |
| A1   | A1   | 2018-06-05 | 2018-06-05 |   50 |
+------+------+------------+------------+------+

Answer #1:

For MySQL 8:

with recursive rcte(dt_id, col, value) as (
  (
    select dt_id, col, value
    from mytable
    order by dt_id
    limit 1
  )
  union all
  select r.dt_id + interval 1 day
       , coalesce(t.col, r.col)     
       , coalesce(t.value, r.value)
  from rcte r
  left join mytable t on t.dt_id = r.dt_id + interval 1 day
  where r.dt_id < (select max(dt_id) from mytable)
)
select r.col, r.dt_id, r.value
from rcte r
order by r.dt_id

db-fiddle

The recursive query will build row by row incrementing the date starting from the first date until the last. The value (and col) is taken from the original table, which is left joined on date. If the original table doesn't have a row for a date, the value of the last row in the recursion is taken instead.

For older versions you can use your calendar table and a subquery in the left joins ON clause to get last existing values:

select b.col, c.date_id, b.value
from time_table c
left join balance b on b.dt_id = (
  select max(dt_id)
  from balance b1
  where b1.dt_id <= c.date_id
)
where c.date_id >= (select min(dt_id) from balance)
  and c.date_id <= (select max(dt_id) from balance)

db-fiddle

Update

Since the question has changed:

select b.col, c.date_id, b.value
from (
  select col, min(dt_id) as min_dt, max(dt_id) as max_dt
  from balance
  group by col
) i
join time_table c
  on  c.date_id >= i.min_dt
  and c.date_id <= i.max_dt
left join balance b
  on  b.col = i.col
  and b.dt_id = (
    select max(dt_id)
    from balance b1
    where b1.dt_id <= c.date_id
      and b1.col = i.col
)
order by b.col, c.date_id

db-fiddle

Make sure you have an index on (col, dt_id). In best case it would be the primary key. date_id in the time_table should also be indexed or the primary key.

Answered By: user3327034
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .



# More Articles