It’s never been more critical to have real-time access to clean, reliable data. The ability to instantly analyze complex information relevant to your business means you can make better decisions and utilize resources more effectively.
Imagine you’re a real estate developer or wholesaler. Data extraction here means combing through listings or auctions to discover the best investment opportunities, which, if done manually, is a laborious and tedious process. However, Bytebot allows for automatic data extraction that can dynamically update along with the webpage to get you the information you need.
The San Francisco Treasure and Tax Collector site lists land and parcel auctions. The auctions vary in class type and whether they’re currently open for bids. Accessing comprehensive auction data on the fly is crucial for real estate professionals such as property speculators, wholesalers, or developers during the bidding process. They need data to understand the range of properties available, comprehend the bids and details of the properties, and potentially integrate this information into analytical tools to inform their bidding strategies.
Real estate professionals could pay for manual data entry, but automation is more straightforward. However, they still need to contend with the following issues:
Bytebot addresses this. This post demonstrates how to automate data extraction from the San Francisco Treasure and Tax Collector's auction site using Bytebot. We will cover how to setup Bytebot for extraction, how to extract data to a table, how to dynamically update your extraction pipeline, how to extract specific attributes from listings, and the benefits of deploying Bytebot over other automated tools to improve efficiency and accuracy in extracting data for real estate auctions.
Generally, data extraction is built around scripts that automate the process, making it more efficient to pull the data you need from a webpage. Bytebot works similarly. Here’s a breakdown of the script we used to capture data from the auction site.
The benefit is that Bytebot can automatically determine the necessary actions through page updates, saving users valuable time.
Once we’re ready to extract data, we will want to take all relevant auction data and export it to a table where it can quickly parse it. This is where another method in the Bytebot SDK comes into play to pass the schema of the data we want to be extracted.
In this script, we have configured a table with four named columns to perform actions on, accompanied by a freeform prompt describing each column. These prompts guide the LLM in categorizing the extracted data. We can also specify that the data should be in text form. This produces a 2D array, each array being a row containing an object for each column in the table. Each column also includes which action should be performed, in this case, copying text.
Once this is done, we can execute the actual action, save the result, and print out the table as a typescript object.
The next stage is doing something more complex than extracting a static table.
On the auction site, you can view additional details of each property available for auction, including property class, land value, whether or not it has a lien against it, and hyperlinks.
With just a couple of lines of logic, you can pull the details of a particular ID by using 'bytebot.act
' to make a freeform prompt asking for the additional details of the specific property. This is possible because we can take a row from a table and then reference that row ID in the freeform prompt. Then, using 'detailActAction
', we can click on the show/hide detail button on the website to expand the details for that property.
Unlike other tools, which require you to specify the pipeline to scrap at the beginning of your script, Bytebot allows for dynamic routing or parsing in just a few simple steps. This functionality will enable you to quickly pass the table we’ve generated to your other APIs and allows for specific and more granular analysis of your extracted data without requiring multiple pipelines.
We’ve shown how to extract data with Bytebot and how to update your extraction dynamically. However, the last part of the puzzle is extracting specific attributes. Attributes refer to contextual information, such as links or ‘href’. Using the helper methods to define our extraction schema, we can either extract these attributes independently or combine them with tables such as the one we designed earlier. In this example, we will extract the parcel map link for a specific auction.
Again, the setup is that we use a dynamically generated freeform prompt with the ID number to help guide the LLM on what to target. Since the LLM can look at the entire webpage, giving it specific instructions on what to target and focus on is key to successfully extracting the data we want.
The LLM can help specify the action and extract the attribute we’re targeting, which in this case is ‘href.’ When we execute the actual action and print it, we get the parcel link, leading us to the .pdf on the webpage.
Once you’ve run all these processes, you can view the results through the Bytebot dashboard. You can inspect the session and see the specific steps Bytebot took to extract your data. This is extremely valuable for validating the pipeline and observability.
Improve your investment strategies with Bytebot using real-time, dynamic pipelines to get you the data you need to make crucial business decisions. Bytebot automates complex data extraction tasks to ensure you gather the detailed information required reliably and efficiently.
Are you a developer ready to improve your data extraction capabilities? Sign up to start using Bytebot and transform your data extraction processes today.