Artificial Intelligence and Power Platform Automation for For Scanned Document Parsing and Reporting

The Current Process

Our journey commenced with the influx of numerous filled documents from the previous year, alongside the continuous arrival of new files from contractors. These documents, comprising nine pages each, detailed the yearly Solid Waste Hauler's Tonnage Reports, mandated by Virginia Code Section 10.1-1411 and required by Arlington County. Commercial hauling companies were obligated to submit an Annual Solid Waste & Recycling Report to the county for the preceding calendar year.

Traditionally, these data-filled documents were manually entered into spreadsheets, predominantly Excel, and transformed into charts for managerial analysis. However, this manual process proved laborious, consuming significant employee resources. Through meticulous auditing, we uncovered a startling revelation: approximately 15-20 percent of entries were error-prone, exacerbating the inefficiencies of the existing system.

Brainstorming Solutions In the DSS-PMO Group and Delivering MVPs

The brainstorming process kicked off with a range of initial solutions, each presenting its own merits and limitations, which were subsequently presented to the chief department of Solid Waste. Among the proposed solutions were:

Microsoft Forms & Google Forms: Despite their user-friendly interfaces, these platforms lacked the dynamic capability to adapt to changing selection criteria, often resulting in excessively lengthy forms.

Survey Monkey: While offering robust survey functionalities, its county-wide subscription requirement posed a barrier, especially considering the limited tech-savviness of haulers.

Versiform: This internal form-generating tool, utilizing C++ coding, showed promise but necessitated collaboration with other departmental staff. However, its results fell short of providing a high-level solution.

PowerApps: Ideal for internal staff within companies, PowerApps presented access challenges for external contractors, limiting its applicability.

✅Azure AI Document Intelligence (OCR): Leveraging the current form, this solution demonstrated potential in automating document processing through optical character recognition technology, promising a more streamlined approach to data extraction.

One of the Report Pages

Custom Extraction Model in Azure AI Document Intelligence

The Azure AI Document Intelligence boasts an impressively trained model, particularly user-friendly when utilizing pre-trained models. However, our endeavor required the utilization of a custom extraction model tailored to accommodate the intricacies of the nine-page documents, which often exhibited variations among haulers. A notable challenge emerged when certain haulers neglected to fill or scan sections of the form that didn't apply to them, complicating the extraction process significantly.

In a strategic move to optimize resources, we opted for a cost-effective approach by submitting all files and retrieving results in JSON format via Python. Yet, our initial hurdle arose when the model encountered numerous tables, hindering Azure AI Document Intelligence's ability to provide confidence scores for accurate field identification. This limitation posed a considerable obstacle, prompting us to devise an innovative solution. Recognizing that Azure DI provides confidence scores for individual words, we engineered a novel Python function to calculate the average confidence score for consecutive words constituting a cell string. This breakthrough empowered us to overcome the challenge, enabling us to pinpoint areas requiring further review with precision. Subsequently, in the validation phase of our project, this methodology proved invaluable in flagging confidence discrepancies, facilitating swift resolution through either hauler or staff review of the scanned documents.

High Level PowerBI Dashboard

Once we completed the export and manipulation of the data, transforming them into multiple Excel tabs, each representing a distinct section of the report, we seamlessly integrated Power BI into our workflow. Harnessing the power of the Power Query editor, we embarked on an exhaustive journey of data cleaning and transformation to ensure the utmost accuracy in our insights. However, our commitment to precision didn't end there.

Despite the confidence scores provided by Azure AI Document Intelligence, we recognized the importance of capturing errors that fell outside these scores. For instance, instances where haulers incorrectly filled in data adjacent to the correct spot in a table, resulting in misinterpretation of material sections. Additionally, incomplete submissions posed a challenge, leaving us unsure whether the absence of data was intentional or an oversight. To address these nuances, we leveraged Power BI's robust capabilities to highlight such discrepancies for review by our staff. On the other hand, in some fields values there missing points that makes the numbers so big and we should capture outliers, in these cases the confidence level returning by Azure DI is high and we can't only rely on it.

Furthermore, we encountered issues stemming from incorrect form filling practices, such as haulers inputting their own categories instead of adhering to predefined ones. This led to the generation of extraneous columns during our transformation process, complicating data analysis, especially when dealing with files processed as a collective folder. Through meticulous attention to detail and innovative problem-solving, we navigated these challenges, ensuring that our data-driven insights remained accurate and actionable, ultimately driving informed decision-making within the organization.

In addressing the aforementioned challenges, our Power BI dashboard was meticulously crafted to provide comprehensive insights and facilitate efficient review processes. We structured the dashboard with a focus on enhancing staff oversight and enabling thorough data examination.

To tackle the issue of identifying errors and inconsistencies, we designed two dedicated tabs within the dashboard. These tabs served as a centralized hub for staff to meticulously review confidence scores and verify data accuracy via PowerApps, ensuring that no detail went unnoticed. Additionally, to streamline the review of aggregated report values, we meticulously curated five tabs, each offering a comprehensive overview of current performance metrics derived from the reports.

Furthermore, recognizing the importance of benchmarking against previous years' performance, we integrated two tabs dedicated to comparing current hauler performance with historical data. This comparative analysis not only provides valuable insights into trends and patterns but also aids in identifying areas for improvement or commendation.

As part of our ongoing effort to maintain data integrity and operational efficiency, we incorporated proactive measures to address hauler non-compliance. With each new cycle of hauler reports, our dashboard automatically flags non-submissions, prompting staff to follow up with the respective haulers. This proactive approach ensures timely intervention, whether it be to remind haulers of submission deadlines or to ascertain their market status.

Through the seamless integration of data visualization and proactive monitoring features, our Power BI dashboard empowers staff with the tools they need to uphold data accuracy, drive strategic decision-making, and foster continuous improvement within the organization.

Giving the Abilty to Edit the Incorrect Data to the Auditing User Via Connecting PowerBI to ApowerAPPs

Positioned strategically in the upper right corner of our Dashboard, lies a pivotal feature designed to streamline the data review and editing process for staff members. Leveraging the insights gained from reviewing reports and assessing the risk associated with each file, staff are equipped with the capability to seamlessly transition to a PowerApp interface for data editing. This integration not only enhances efficiency but also empowers staff to swiftly address any discrepancies or inaccuracies identified during the review process.

Central to this workflow is a meticulously crafted PowerApp interface, intuitively designed to facilitate seamless data editing. With user-friendly functionalities and a structured layout, the PowerApp empowers staff to make necessary amendments with ease and precision. Whether correcting inaccuracies, updating information, or adding supplementary details, the PowerApp provides a streamlined platform for data refinement.

Furthermore, to ensure real-time data accuracy and consistency, we implemented a Power Automate workflow seamlessly integrated with our Power BI dataset. This innovative automation triggers dataset refreshing in real time, ensuring that any changes made within the PowerApp are immediately reflected in the Dashboard's metrics and visualizations. This seamless synchronization between data editing and dashboard updates not only enhances data integrity but also fosters agility in decision-making processes.

By embracing the synergy between PowerApp for data editing and Power Automate for real-time dataset refreshing, our dashboard ecosystem empowers staff with the tools they need to maintain data accuracy, drive actionable insights, and foster a culture of continuous improvement within the organization.

Managerial Dashboard and Year-to-Year Comparison

In conclusion, this project represents a pure business intelligence endeavor aimed at revolutionizing data management and decision-making processes within the organization. By harnessing the power of Power BI, PowerApps, and Power Automate, we have constructed a dynamic dashboard ecosystem that empowers staff with comprehensive insights, streamlined data editing capabilities, and real-time data synchronization. From meticulous data cleaning and transformation to proactive hauler compliance monitoring, every aspect of this project has been meticulously crafted to enhance operational efficiency and drive informed decision-making. As we move forward, this business intelligence solution stands as a testament to our commitment to innovation and excellence in navigating the complexities of modern data management.