Zero Shot Model: Best Practices

Overview:

Nanonets’ zero-shot models use only their intelligence to extract data, meaning they do not require any initial training data to start extracting information effectively. It is important to provide clear and effective prompts or descriptions for each label or table header when defining them in the Manage Label section. The zero-shot model uses these prompts to extract data from the uploaded documents.

What is the difference between Fields and Table Headers?

  • Fields: When defining a field, it is assumed that on a given page or within a specific document, the field will have a singular value.
Fields Section

Fields Section

  • Table Headers: When defining a table header, it is possible that multiple values for a single field can exist per page or document.
Table Header Section

Table Header Section

Best Practices:

  1. Field Naming:

Be precise in setting up the field name:

  • Choose Descriptive Names: For clarity and precision, select field names that directly reflect their content. For example, use linkedin_username for LinkedIn usernames extracted from resumes, rather than just linkedin.
  • Use Standard Abbreviations: Employ commonly recognized abbreviations to keep field names concise and understandable.
    • Acceptable Example: DOB for "Date of Birth."
    • Unacceptable Example: RMT should not be used for "Road Motor Transportation"; use the full term instead.
  • Avoid Truncating Words: Do not shorten words within field names as this can lead to confusion and ambiguity.
  • Use Formal Terminology: When formal terms are available, prefer these over descriptive phrases to maintain professionalism and consistency.
    • Preferred: given_name instead of first_and_middle_name.
  1. Field Description:

Be precise in giving field descriptions: Ensure that each field's purpose and expected output are clear. This helps in achieving accurate and useful predictions.

Here are examples to illustrate:

  • Less Effective Example:
    • Field Name: Order_ID
    • Field Description: A number related to an order.
  • More Effective Example:
    • Field Name: Order_ID
    • Field Description: Order_ID is a unique identifier used to track and manage customer orders within a system. If an 'Order_ID' is present in the invoice, always use this 'Order_ID' instead of the 'Purchase_Order_ID'.
    • This ensures the correct identification and tracking of orders.