Skip to content

Web scraping data for generative AI

Check out the recording of our live webinar on how to feed your LLMs with web data using LLM integrations like 🦜🔗 LangChain, LlamaIndex 🦙 or Pinecone, and Apify Actors, like Website Content Crawler.

 

Power up your AI game

Data is the fuel for AI, and the web is the largest source of data ever created. Today's most popular language models, like ChatGPT and LLaMA, were all trained on data scraped from the web. Apify gives you the tools to feed, fine-tune, or create your own AI models by bringing vast amounts of data from the web to your fingertips. Watch our webinar to learn how to use these tools and ask questions. 

What's covered in the webinar? 

  • How web scraping enabled the AI revolution
  • Website Content Crawler demo - why we created it, what it does, and how to ingest entire websites with it
  • Feeding large language models with data from the web via the 🦜🔗 LangChain integration
  • Enriching your LLMs using other Actors from Apify Store
  • Q&A

Speakers

🧑‍💻Jindřich Bär - Website Content Crawler & Tooling developer

🌐Theo Vasillis - Content Producer & Editor (moderator)

Duration

60 minutes (45 mins demo + 15 mins Q&A)

Requirements

No prior experience is required, but we do recommend checking out our existing resources on data for generative AI.


Learn how to feed your LLMs with web data