Extracting PDF form data – Convert XML string to JSON object and Parse it using Power Automate 

This is the second blog in the “Extract PDF Forms Data” blog series. In this blog post, we will extract PDF form data as XML, Convert XML string to a JSON object and Parse it using Power Automate.

Scenario:

The Scenario will be exactly the same, i.e. Employees download a Form from the company portal, fill in the relevant details, and then upload the form to a SharePoint Document Library. This Power Automate solution(Microsoft Flow) picks up the uploaded form automatically, extracts the data contained within the PDF form, and then adds the data to a SharePoint list.

Before we begin, please make sure the following prerequisites are in place:

  • Power Automate subscription.
  • Muhimbi PDF Converter Services Online Free, full or trial subscription (Start trial). If you are only interested in understanding how to Convert XML to JSON and then parse it, you can skip this step and simply copy the XML which I have provided below.
  • Appropriate privileges to create Power Automate Flows.
  • Working knowledge  Power Automate.

Why Convert XML string to JSON Object? Is Parsing JSON easier that Parsing XML?

JSON’s data types also have a 1:1 mapping(key-value pair) to data while, XML is a markup language and uses tag(<>) structure to represent data items. JSON has a smaller overhead compared to XML and hence it’s easier and quicker to parse (as it’s more lightweight).

But that being said at times Converting XML to JSON might become a little overwhelming and you might want to use the XPATH() to extract meaningful information.

Note – If you are trying to understand how XPath works you can skip Step 1 to Step 5, copy the XML in the compose action and then follow along.

Step 1 – Create a new Flow using the “Automated cloud Flow” option.

Step 2 – Give your Flow a meaning name Enter a name for the Flow, select the “When a file is created in a folder” SharePoint trigger, click “Create“.

In the trigger, specify the path to the SharePoint Online Library to monitor for new files.

Step 3 –  Add the SharePoint “Get file content” action to the Flow Canvas and configure it with reference to the details below:

  • Site Address: Specify the path to the SharePoint Online site-collection which holds the file.
  • File Identifier : “x-ms-file-id” the output from the “When a file is created in a folder” action.

Step 4 – Add the Muhimbi “Convert document” action to the Flow Canvas and configure it with reference to the details below.

  • Source File name: “File name with extension” the output from the “When a file is created or modified (properties only)” action.
    • Source file content: “File Content” the output of the “Get file content” action.
  • Output Format: XML

Step 5 – Add the “Compose” action to the Flow Canvas and add the Expression below.

base64ToString(outputs('Convert_document')?['body/processed_file_content'])

The expression will Convert “Base 64 to String“.

Step 6 –Add the “Compose” action to the Flow Canvas and add the Expression below.

The expression will Convert XML to JSON Object –

json(xml(outputs('Base64_to_XML(String)')))
  • Click on “Test” button on the top right corner.
  • Select the “Manually” option.
  • Upload a supported file type to the folder that is monitored by the configured SharePoint trigger.
  • After a few seconds you should see the output below –

XML output below :

<?xml version="1.0" encoding="utf-8"?>
<fields xmlns:xfdf="http://ns.adobe.com/xfdf-transition/">
  <GivenNameTextBox xfdf:original="Given Name Text Box">Clavin</GivenNameTextBox>
  <FamilyNameTextBox xfdf:original="Family Name Text Box">Fernandes</FamilyNameTextBox>
  <Address1TextBox xfdf:original="Address 1 Text Box">Sai Jewel</Address1TextBox>
  <HousenrTextBox xfdf:original="House nr Text Box">Central Park</HousenrTextBox>
  <Address2TextBox xfdf:original="Address 2 Text Box">Kharghar</Address2TextBox>
  <PostcodeTextBox xfdf:original="Postcode Text Box">410210</PostcodeTextBox>
  <CityTextBox xfdf:original="City Text Box">Navi Mumbai</CityTextBox>
  <CountryComboBox xfdf:original="Country Combo Box">India</CountryComboBox>
  <GenderListBox xfdf:original="Gender List Box">Man</GenderListBox>
  <HeightFormattedField xfdf:original="Height Formatted Field">180</HeightFormattedField>
  <DrivingLicenseCheckBox xfdf:original="Driving License Check Box">Yes</DrivingLicenseCheckBox>
  <FavouriteColourListBox xfdf:original="Favourite Colour List Box">Yellow</FavouriteColourListBox>
</fields>

JSON Output :

{
  "?xml": {
    "@version": "1.0",
    "@encoding": "utf-8"
  },
  "fields": {
    "@xmlns:xfdf": "http://ns.adobe.com/xfdf-transition/",
    "GivenNameTextBox": {
      "@xfdf:original": "Given Name Text Box",
      "#text": "Clavin"
    },
    "FamilyNameTextBox": {
      "@xfdf:original": "Family Name Text Box",
      "#text": "Fernandes"
    },
    "Address1TextBox": {
      "@xfdf:original": "Address 1 Text Box",
      "#text": "Sai Jewel"
    },
    "HousenrTextBox": {
      "@xfdf:original": "House nr Text Box",
      "#text": "Central Park"
    },
    "Address2TextBox": {
      "@xfdf:original": "Address 2 Text Box",
      "#text": "Kharghar"
    },
    "PostcodeTextBox": {
      "@xfdf:original": "Postcode Text Box",
      "#text": "410210"
    },
    "CityTextBox": {
      "@xfdf:original": "City Text Box",
      "#text": "Navi Mumbai"
    },
    "CountryComboBox": {
      "@xfdf:original": "Country Combo Box",
      "#text": "India"
    },
    "GenderListBox": {
      "@xfdf:original": "Gender List Box",
      "#text": "Man"
    },
    "HeightFormattedField": {
      "@xfdf:original": "Height Formatted Field",
      "#text": "180"
    },
    "DrivingLicenseCheckBox": {
      "@xfdf:original": "Driving License Check Box",
      "#text": "Yes"
    },
    "FavouriteColourListBox": {
      "@xfdf:original": "Favourite Colour List Box",
      "#text": "Yellow"
    }
  }
}

Step 7 – Add the “Parse JSON” action.

  • Content – Add the output of the “XML to JSON” action.
  • Click on “Generate from Sample” and copy the JSON from the above step.

Step 8 – Add SharePoint “Create Item” action, we will directly pass the expressions in the “Create item” action.

  • Site Address: Enter the location of the SharePoint list where list item should be added.
  • List Name: Select the SharePoint list where the list item should be added

Note: In this step, we will directly add the Expressions and Maps the Expressions to the SharePoint list.

  • Title – body(‘Parse_JSON’)?[‘fields’]?[‘GivenNameTextBox’]?[‘#text’]
  • FamilyNameTextBox – body(‘Parse_JSON’)?[‘fields’]?[‘FamilyNameTextBox’]?[‘#text’]
  • Address1TextBox – body(‘Parse_JSON’)?[‘fields’]?[‘Address1TextBox’]?[‘#text’]
  • HousenrTextBox – body(‘Parse_JSON’)?[‘fields’]?[‘HousenrTextBox’]?[‘#text’]

Note – You can also directly pass the #text field but it’s too confusing so we used an expression instead.

That’s it- you’re done!  Publish your Flow and upload a supported file type to the folder that is monitored by the specified SharePoint trigger. After a short wait, a new list item within the target SharePoint list should have been created.

Subscribe to this blog for the latest updates about SharePoint Online, Microsoft Flow, Power Apps and document conversion and manipulation using The PDF Converter.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s