How to add new column to tables?

Overview: This script adds a new column to a table structure by manipulating the cell data. It iterates through the table rows, and appends a new cell for each row in the designated column position.

Python Script:

In the script below, we have also integrated a webhook function to help with debugging any issues you may encounter. This will allow you to easily log and trace data or errors during the execution of the script.

import uuid
import requests

# Function to create a new table cell
def get_table_field(row, col, label, text, xmin=0, ymin=0, xmax=0, ymax=0):
    cell = {
        "id": str(uuid.uuid4()),  # Generates a unique ID for each cell
        "row": int(row),  # Row number
        "col": int(col),  # Column number
        "row_span": 1,  # Default row span
        "col_span": 1,  # Default column span
        "label": str(label),  # Label for the new column
        "xmin": xmin,  # Minimum x-coordinate (bounding box)
        "ymin": ymin,  # Minimum y-coordinate (bounding box)
        "xmax": xmax,  # Maximum x-coordinate (bounding box)
        "ymax": ymax,  # Maximum y-coordinate (bounding box)
        "score": 0.89697266,  # Default confidence score
        "text": str(text),  # Text value for the cell
        "row_label": "",  # No row label
        "verification_status": "correctly_predicted",  # Verification status
        "status": "",  # Status field
        "failed_validation": "",  # Validation failure flag
        "label_id": ""  # Label ID
    }
    return cell

# Webhook URL to send debugging or processing information
webhook_url = 'YOUR_WEBHOOK_URL'

# Function to trigger a webhook for debugging
def webhook(body="You missed adding parameter", send=True):
    if not send:
        return
    if isinstance(body, str):
        hh = requests.post(webhook_url, json={"text": body})
    else:
        hh = requests.post(webhook_url, json=body)
    return hh.status_code == 200

# Main handler function
def handler(input_data):
    try:
        # Sort input data by page number
        sorted_input = sorted(input_data, key=lambda x: x['page'])
        webhook(sorted_input)  # Send sorted data for debugging

        # Iterate through the pages
        for page in sorted_input:
            # Retrieve moderated or predicted boxes from the page
            boxes = page.get('moderated_boxes', []) if page.get('moderated_boxes', []) else page.get('predicted_boxes', [])

            # Loop through the boxes to find table-related ones
            for box in boxes:
                if box.get('label') == "table":
                    # Sort cells by row and column
                    cells = box.get('cells', [])
                    cells = sorted(cells, key=lambda x: (x['row'], x['col']))
                    
                    # Identify the maximum row in the table
                    max_row = 0
                    for cell in cells:
                        max_row = max(max_row, cell['row'])

                    # Add a new column for each row
                    for row in range(max_row):
                        cells.append(get_table_field(row+1, 10, "New column header", "New value"))
                    
                    # Update the box with the new cells
                    box['cells'] = cells

    except Exception as e:
        import traceback
        # Log the exception details via webhook
        traceback_str = traceback.format_exc()
        webhook(traceback_str)
    
    # Send final output via webhook for debugging
    webhook(sorted_input)
    return sorted_input

How it Works:

  • Sorting Input Data: The script begins by sorting the input data by page number, ensuring the correct order of processing.
  • Extracting and Sorting Table Cells: It identifies table boxes on each page, sorts the existing cells by row and column, and prepares for adding a new column.
  • Adding the New Column: For each row, the script adds a new cell with the column number set to 10, label "New column header", and text "New value". You can modify these parameters based on your use-case.
  • Error Handling: The script includes exception handling to log and send error details via a webhook.
  • Webhook Integration: Webhooks are used to log intermediate results and errors for easy debugging.

Modifications:
You can customize the following parts:

  • Column number: Adjust the value in get_table_field(row+1, 10, ...) to change the position of the new column.
  • Label and Text: Replace "New column header" and "New value" with your desired label and values.
  • Webhook URL: Change the webhook_url to your own endpoint for receiving log data or errors.