There is a time when 2 dataframes need to be summarized in a single table.
For example, a fact table needs to get extra descriptions from outer sources.
Using these tables as an example:
fact_sales_daily
item_id | payment |
---|---|
123 | 13.89 |
345 | 23.45 |
123 | 45.00 |
123 | 56.77 |
789 | 23.76 |
101 | 23.43 |
789 | 23.76 |
scraped_items
item_id | desc | source | expiry_date |
---|---|---|---|
123 | French Fries | amazon | 2022-05-01 |
345 | Pizza | food truck | 2022-05-01 |
789 | Fruit | food truck | 2022-05-01 |
101 | Sweets | shopee | 2022-06-14 |
Say that you have a fact table that needs extra description. This extra table does not exist in the database. Probably a co-worker gives it to you in an excel file.
A very naive way is to loop and assign by row. But that will take a long time if the table is huge.
You can try this way.
Step 1: Create dict of item_id and value you want to map
item_source_dict = dict(zip(scraped_items['item_id'], scraped_items['source']))
Step 2: Map the column to the dict
fact_sales_daily['item_source'] = fact_sales_daily['item_id'].map(item_source_dict)
This step maps the value of the item to its value from the dict. And the result is stored in a column item_source.
You will get something like this
item_id | payment | source |
---|---|---|
123 | 13.89 | amazon |
345 | 23.45 | food truck |
123 | 45.00 | amazon |
123 | 56.77 | amazon |
789 | 23.76 | food truck |
101 | 23.43 | shopee |
789 | 23.76 | food truck |
Simple and readable.