Web scraping data for generative AI
Check out the recording of our live webinar on how to feed your LLMs with web data using LLM integrations like 🦜🔗 LangChain, LlamaIndex 🦙 or Pinecone, and Apify Actors, like Website Content Crawler.
Power up your AI game
Data is the fuel for AI, and the web is the largest source of data ever created. Today's most popular language models, like ChatGPT and LLaMA, were all trained on data scraped from the web. Apify gives you the tools to feed, fine-tune, or create your own AI models by bringing vast amounts of data from the web to your fingertips. Watch our webinar to learn how to use these tools and ask questions.
What's covered in the webinar?
- How web scraping enabled the AI revolution
- Website Content Crawler demo - why we created it, what it does, and how to ingest entire websites with it
- Feeding large language models with data from the web via the 🦜🔗 LangChain integration
- Enriching your LLMs using other Actors from Apify Store
- Q&A
Speakers
🧑💻Jindřich Bär - Website Content Crawler & Tooling developer
🌐Theo Vasillis - Content Producer & Editor (moderator)
Duration
60 minutes (45 mins demo + 15 mins Q&A)
Requirements
No prior experience is required, but we do recommend checking out our existing resources on data for generative AI.
Learn how to feed your LLMs with web data
Important resources for generative AI
- Join our AI web scraping channel on Discord to continue the conversation
- Try Website Content Crawler for free
- Data for Generative AI - main page
- Webinar survey, please take 2 minutes to let us know what you think we can improve
- Our content on generative AI
- Apify & LangChain integration tutorial
- AI-related Apify Actors