So you wanna load the content of your Google Forms programmatically? access all the questions, answer options, field types, submit ids and so on as you wish? 😉 Stick around!
Hope yol remember in my last blog post, SCRIPTfully scrape off your Google Forms Field Ids… I was sharing a neat little script I built to extract the Field identifier IDs from your Google Forms page, so that you can use them with ease for auto filling question answer fields or submitting data using REST API into your Google Forms! So this is the nest step in this series of Google Forms Hacks!
In this post we’re gonna load your Google Forms skeleton programmatically!
What does it mean?
Yes, instead of loading your Google Forms page on the typical web browser, why not load the bare bone content of it as you wish? By this I mean,
- List of Questions (your question content…)
- The types of Questions (Short Answer, Paragraph, Checkbox, etc)
- Available Answer options (Multiple choice answer questions…)
- Title and Description of your Google Form, etc…
- And many more details you have added to your Google Form…
Once you get access to those bare bone content or the skeleton structure of your Google Form, then you can do all kinds of stuff with it…
Then you can render it as you wish and present to your users, re-render it into a Web App or a Desktop app or even a Mobile App with your own custom layouts, filtering, and validations! 😀
So how we gonna do it?
Simply put, we gonna extract the skeleton or the bare bone structure from our Google Forms page. Now as you may have figured out there’s no official API or SDK to access Google Forms services programmatically, therefore we obviously have to hack our way around this!
We’re going to build a little a script which will load the HTML content of our Google Form and perform a magical algorithm to extract our Google Forms structure! All the questions, answer options, validation, and etc the whole deal… 😉
Backstory…
In my last post hope you remember how I shared about scraping through the HTML content of our Google Form page to extract Field identifiers which is used to submit answers or otherwise known as Field Answer identifiers.
Then along that same time I was experimenting with trying to extract the whole structure of the design the same manner. But filtering through the HTML tags to retrieve the complex structure of a given Google Forms Question-Answer field structure seemed quite hectic.
Jackpot!
So while I was going up and down the HTML of the page, trying to find a better way to extract our Google Forms Question-Answer structure, I came across this interesting piece of script at the end of the page.
As you can see at the bottom there’s java script starting with “FB_PUBLIC_LOAD_DATA_ = [nulll…” with a strange pattern of the content, which I’m not sure what the purpose of it in this page, but you can surely find this in any Google Forms page. This specific script snippet seem to be holding the whole skeleton of our Google Forms page as you can see, it containing the Question Content, Answer Fields and so on!
So there’s our little treasure… 😉
Now finding this bit wasn’t enough at all, I had to figure out how to parse this properly to extract the data that we’re looking for, as you can see it’s got a little unorthodox structure in its content. Now that’s the next challenge!
Let the hacking begin…
Now for this post also let’s use the same sample questionnaire Google Form that I created for last post’s demo.
https://docs.google.com/forms/d/e/1FAIpQLSeuZiyN-uQBbmmSLxT81xGUfgjMQpUFyJ4D7r-0zjegTy_0HA/viewform
So we’re going to load the HTML content of our Google Forms page, and then we’ll extract that mystery javascript code snippet, and parse the content of it to access the skeleton structure of our Google Forms!
Now first let me walk you through how to parse that mystery FB_PUBLIC_LOAD_DATA_ content! 😉 Let me warn you though, figure this out was no walk in the park but let me share the secret source with yol straight away.
Let me copy and paste it here from my sample Google Form and get started! Below you can clearly see how all the Questions along with Answers and Field identifiers in my sample Google Form are contained in this mystery code snippet.
var FB_PUBLIC_LOAD_DATA_ = [null,["Please fill up the following questions. You need to answer all the fields please! ;) ",[[122249536,"Hello there, this is question 1, could you answer?",null,0,[[1277095329,null,0]]],[1170747525,"Which one would you prefer as the answer from below?",null,2,[[995005981,[["Mango Peach",null,null,null,0],["Banana Plums",null,null,null,0],["Strawberry Pears",null,null,null,0]],1,null,null,null,null,null,0]]],[2147453523,"Well another question here wouldn't hurt now eh?",null,4,[[1155533672,[["Monkeys with hoodies",null,null,null,0],["Dogs with hats",null,null,null,0],["Cats with crowns",null,null,null,0]],1,null,null,null,null,null,0]]],[172187917,"How about this for a change?",null,3,[[1579749043,[["Running Banana",null,null,null,0],["Jumping Apples",null,null,null,0],["Rolling Pears",null,null,null,0]],1,null,null,null,null,null,0]]],[676251522,"What's the date would you like to be today?",null,9,[[815399500,null,1,null,null,null,null,[0,1]]]],[1280585510,"What time would it be right now? ",null,10,[[940653577,null,1,null,null,null,[0]]]]],null,null,null,[0,0],null,null,"Sample questionnaire!",48,[null,null,null,null,0],null,null,null,null,[2]],"/forms","Random Sample Questionnaire",null,null,null,"0",null,0,0,"","",0,"e/1FAIpQLSeuZiyN-uQBbmmSLxT81xGUfgjMQpUFyJ4D7r-0zjegTy_0HA",1];
In order to parse this into a known data structure you need to first remove the “var FB_PUBLIC_LOAD_DATA_ = ” and the “;” at the end if you notice carefully.
However by looking at the content you can make a guess that it should be some JSON based content structure.
So let’s use any online JSON parser tool and you should be able to see the formatted content as follows. ex: jsoneditoronline, codebeautify or jsonformatter
Now you get some readable structure yeah, where you are able to traverse through it’s content by expandable nodes array.
Pattern Recognition and Analysis!
Since we parsed the mystery content into a JSON Array tree, we can now traverse through the data easily and extract the specific data that we’re looking for.
Now since there’s no official documentation regarding the parsing of this data from structure, I guess we’re going to have to figure out ourselves, where to pick what data in this data tree and recognize the pattern.
So basically what we are going to be focusing on retrieving the following data,
- Google Form Title, Description, Form ID
- List of Question Fields
And in each question,
- Question Field Text
- Question Type
- If submitting Answer is mandatory or not
- Available answer list (Multiple answer selection)
- Question Field Identifier (Field Id) or Answer Submission ID
Now that I believe comprises the complete structural skeleton of any given Google Form! 🙂
– General Google Form data
In the parent root you can see the Google Form Doc’s Name in the [3] index.
Then the Form Id in the index [14] of the array tree. And rest of the important bits seem to be in the [1] index node as you can see it has a lot of child nodes in it. Let’s look into that..
Now this index item seem to be containing a lot of information in child nodes. In its child node index [0] you can see the Description of our Google Form, and index [8] holds the Title.
Next the most important node, that is node index [1] which contains all the Question Fields data, and it looks like this.
Once you expand it you can view the List of child elements that represents the Question Fields in your Google Form!
Now you know you can easily access the Question fields by traversing [1][index of the question field]
Let’s try opening up a child node then! 😉
– Question Field data
Woot! Here you have it, the whole Question Field as expected, and specifically in this sample questionnaire, you can see how it shows all the details regarding the 1st Question Field.
Also notice the value I’m pointing to below, node [1][0][3] to be exact, that index holds the Question Type value “0”, which determines whether it’s a Short Answer, Paragraph, Dropdown, Checkbox field and so on. Then the second arrow, node [1][0][4][0][0], that right there is the unique Field identifier that we need to use when submitting answer to this question field. 😀 Thereby we could consider the value as Answer Submit Id as well.
Now that’s a single answer field, such as Short Answer and Paragraph Question field types in Google Form.
In the node [1][0][4][0][2] holds the value to determine whether Answer Required or not.
So how about we open up a child node of a Multiple Answer field?
Now below I’ve open up the node of a Multiple Answer selection Question Field.
– Multiple Answer Question Field data
Right here you can see in the Question Type value node it has number “2” as the value, which is what Multiple Answer Question Fields are denoted with.
Over here you get an extra child node underneath the Field identifier node as you recognized above.
This has a list of child nodes which as you can see holds the list of Answers Available for the Question Field.
So now we know how to load the list of Answers available for a given question, that we can traverse through.
Next let me dive a bit deep into Question Type identifier values..
– Hunt for Question Field types…
You already know Google Forms provide multiple types of Question Fields that can be added to your Google Form, and you saw above where to grab the Field type value. But how do you know which value maps to what type?
That’s why I had to run a trial and error recognition of trying to match those numeric values to the actual Field types, and I’ve finalized the list as follows…
- Short Answer Field = 0
- Paragraph Field = 1,
- Multiple Choice Field = 2
- Check Boxes Field = 4
- Drop Down Field = 3
- File Upload Field = 13
// File Upload – we’re not going to implement for this right now, because it needs user log in session implementation, a bit complicated. So let’s look into it later!
- Linear Scale Field = 5
- Grid Choice Field = 7
//Grid Choice – represents both: Multiple Choice Grid & Checkbox Grid
- Date Field = 9
- Time Field = 10
As you can see the assigning of the numeric values for the available types of Fields are not so straight forward and Google Forms tends to mix up or skips some mapping of the values without a proper order. That’s why I mentioned it was a bit painful trial and error process.
Now we’ve walked through the recognition of the data pathways that we’re hoping to extract, let’s summarize our analysis as follows..
- The most crucial node is index [1] which holds all the data we need
- Although Google Form’s Id is in the root node [14]
- Description is in the node [1][0]
- Title is in the node [1][8]
- All the Question Field data is in node [1][1]
- We need to traverse through this child node list and fetch each item
- Question Field Text is in [1]
- Question Field Type is in [3]
- Question Field Id is in [4][0][0]
- Question Field Answer required or not is in [4][0][2]
- Multiple Answer Options are in [4][0][1]
- Multiple Answer Options needs to be traversed and loaded
- We need to create a mapping for numeric values of Question Field Types to readable values, Short Answer Field, Paragraph Field, Multiple Choice Field, etc.
So that’s the complete list of analysis that we have derived from this step, which we need to carry forward to our next level, that is implementing all this logic and retrieving the complete skeleton structure of your Google Forms page!
Let the coding begin!
So just like in the previous article, I’m going to use dotnet and C# as the language for our little code snippet. And to parse HTML content, I choose HTMLAgilityPack. Then we need Newtonsoft.Json to perform our JSON data structure execution. Also I would be using a Console Project type in dotnet, pretty simple to begin with.
Given you have created the project and added the HTMLAgilityPack and Newtonsoft.Json to your dotnet project, let’s start by creating the model class
Enum mapping class for the Question Field Types that we identified before, so that we can easily cast our scraped out values in code.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// using System.ComponentModel; | |
/// <summary> | |
/// Found the Field type representation values with trial | |
/// and error try out of blood sweat and tears lol! 😉 | |
/// </summary> | |
public enum GoogleFormsFieldTypeEnum | |
{ | |
[Description("Short Answer Field")] | |
ShortAnswerField = 0, | |
[Description("Paragraph Field")] | |
ParagraphField = 1, | |
[Description("Multiple Choice Field")] | |
MultipleChoiceField = 2, | |
[Description("Check Boxes Field")] | |
CheckBoxesField = 4, | |
[Description("Drop Down Field")] | |
DropDownField = 3, | |
// FileUpload – Not supported (needs user log in session) | |
[Description("File Upload Field")] | |
FileUploadField = 13, | |
[Description("Linear Scale Field")] | |
LinearScaleField = 5, | |
// represents both: Multiple Choice Grid | Checkbox Grid | |
[Description("Grid Choice Field")] | |
GridChoiceField = 7, | |
[Description("Date Field")] | |
DateField = 9, | |
[Description("Time Field")] | |
TimeField = 10, | |
} |
Now you might say shouldn’t we create Model classes to represent Google Forms Fields and Google Form parent objects themselves, but I would rather keep that to a future post! 😉 let’s just try to keep things simple in this one!
Then let’s code the method that we’ll be executing the load our Google Form structure! I’m gonna be calling it ScrapeOffFormSkeletonFromGoogleFormsAsync() with a parameter passing in which will carry the URL link to a given Google Form! 😀
Let’s begin by adding the simple LoadFromWebAsync() using the HTMLAgilityPack, which will load the HTML content first.
Next let’s access the FB_PUBLIC_LOAD_DATA_ script content in our HTML doc.
As you can see we are filtering out the html nodes with the “//script” definition which contains “FB_PUBLIC_LOAD_DATA_” value in it. Then we load it into a variable fbPublicLoadDataJsScriptContent which of type string.
Next on, we gotta clean it up by removing the “var FB_PUBLIC_LOAD_DATA_ = ” and the “;” at the end if you notice carefully. So that we can parse the data to a JSON Array structure.
Now we’re ready to parse the content into a JSON Array using Newtonsoft.Json as below.
And also let’s access some basic data of our Google Form, such as Title, Description and the Form ID just like how we discussed in our pattern analysis of this array object structure. Then we load the most important index of the array [1][1] into arrayOfFields variable at the bottom.
Next on we are going to traverse through the list of field data indexes, but here I have added a special filter to identify if the given Field object is an actual Question or a Field placed as a Description Panel or an Image banner, which I have noticed people do to customize their Google Forms. In that case we ignore that object and move on to the next iteration.
There we are looping through each item in arrayOfFields and skipping off the filtered objects. As you can see above, we’re first loading the Field Question text value, then extracting the Question Type value, while using the Enum parser that we build before with mapping the readable Field Type values.
Speaking of accessing Answer Options List for Multiple Choice Questions, we’re handling that next in this code bit.
And we load it up to answerOptionsList object.
Then we load our next Values, Field Answer Submit ID and the value representing if the answer is required to submit or not, with a conversion to boolean which is true or false.
For the IsAnswerRequired value Google gives us “1” or “0” as the representation of true or false, so we need to do that mapping ourselves as you see above.
Then as the last stretch of our loop, let’s print it all out to the Console.
There now the data related to each field is now printed out to the Console nicely.
Let me share the full code snippet that puts it all together below. Strap your seat belts fellas its a long code snippet, therefore I’ll only put a link to it here! 😉
https://gist.github.com/UdaraAlwis/c338a9de4af4509ba0ff67e2c4f37f5c
Yeah click on that Gist link and view the full code snippet over at my Github!
You can use the above method in any of your dot net projects, as long as you have HtmlAgilityPack and Newtonsoft.Json nugets installed and imported in the code. Application is yours to imagine yo, just pass in your Google Forms link text to the method and you’re good to go!
Hit F5 and Run!
Now if you’re on Visual Studio, let’s just run this little snippet of magic eh! 😉
TADAAA! 😀
Here I’m using my simple demo Google Form link, passing it into the method in this little Console dotnet app, and you can see how it nicely loads all the question field data and all the information about my Google Form page.
Here’s a fun side-by-side comparison of programmatically accessing your complete Google Forms skeleton!
Pretty cool eh!
As far as my testing this little script works perfectly for any Google Form that contains the basic main types of Question Fields that are available in Google Forms as of this day!
Imagination is the limit yol! 😉
Well… That’s it!
Who would have thought the FB_PUBLIC_LOAD_DATA_ is such a mysterious yet awesome data snippet hiding in the rendered HTML content of a given Google Forms page! lol 😀
During my experimental research of cracking this mystery, I got some hints from the following python hacks that I derived the same logic into dotnet C# code.
https://gist.github.com/davidbau/8c168b2720eacbf4e68e9e0a9f437838
https://gist.github.com/gcampfield/cb56f05e71c60977ed9917de677f919c
Now keep in mind we do not have precise control whether Google will change these format and data structure patterns in future, so you gotta keep an eye out if you’re planning to use these hacks for a long term solid implementation. My suggestion would be to write up a series of Test cases (TDD yo!)
There you have it, the little magic script to programmatically accessing your complete Google Forms skeleton!
Share the love! 😀 Cheers!