WebDevGrad: Django

Search

Database Search

A general search of an attached data base is simple to put up. Use a form or button to query a database either with user input or a dropdown menu, then return the queried items and loop through them on a HTML results page all covered in the CRUD section.

Algolia

Set up an Algolia Account

Go to the Algolia website and sign up for a free account. I signed up through GitHub to keep all of my web development tools connected. You'll need to get three key pieces of information:

Algolia APP ID
Algolia Search Key
Aloglia Write Key

These can all be found when you have created you first app, go to your account dropdown in the top right-hand corner, select settings, and then "API Keys" which is under the "Teams and Access" section.

Prepare Your Django Project

The first thing to do in you Django project is to install the Algolia Python client. This is the official Algolia Python API client you'll use for pushing and managing your indices. Also install beautiful soup 4, to parse HTML and extract text. Beautiful soup 4 is needed in my project because there is no backend database to search, so I will go through and scrape all the text from each HTML page so that they can be searched.

        
pip install algoliasearch
pip install beautifulsoup4

Then add it to your requirements.

pip3 freeze --local > requirements.txt

Next, introduce your Algolia credentials securely in your project by defining them inside you env.py file.

        
os.environ["ALGOLIA_APP_ID"] = ""
os.environ["ALGOLIA_SEARCH_KEY"] = ""
os.environ["ALGOLIA_WRITE_KEY"] = ""

And loading them from your environment variables to your settings.py.

        
ALGOLIA_APP_ID = os.getenv('ALGOLIA_APP_ID')
ALGOLIA_SEARCH_KEY = os.getenv('ALGOLIA_SEARCH_KEY')
ALGOLIA_WRITE_KEY = os.getenv('ALGOLIA_WRITE_KEY')

Create an App to Handle the Search

To keep things modular, I created an app named search and then added it to the list of installed apps in my settings.py file.

python manage.py startapp search

This app will provide 5 core things:

A template for the search bar that can be included into my base template (I actually created two, as the search bar in my site header on larger screens has a different layout to the search bar in the burger menu on smaller screen sizes).
A template for a page to display the search results
A view to handle the search and results display
URLs to handle the search
An indexer to add records to Algolia. This is not strictly necessary for projects with backends you want to search through such as searching through products on a shop site. Algolia provides a way for you to upload a JSON, CSV (Commaseparated value file), TSV (Tab separated value file) of your backend or even crawl your website for data.

Create an Indexer for the Templates

Within your search app create a python file to index your files such as index_templates.py. The pseudo code would look like:

Standard Imports
Set up Django (so that the file can run)
Third-party Imports
Helper functions
Create an asynchronous function to index the website
- Create Algolia client variable
- Select the Algolia index where the documents will be stored
- Clear the index before adding new files so that there are no duplicates
- Define the templates that I want to index
- Initiate in the records and index the HTML files, adding them to the records
- Upload the records
- Close the Algolia client
- Add main function to run the file from the terminal

Standard Imports

        
# Provides functions to interact with the operating system
import os
# Allows manipulation of the Python runtime environment
import sys
# Required to set up Django settings.py can be accessed
import django
# Easier manipulation of filesystem paths
from pathlib import Path


    

Set up Django, so that the file can run from your project terminal with e.g. python search/index_templates.py

        
# Configure the route directory for your python project
# It is up 2 from this script location in search/index_templates.py
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))

# Set the Django settings module environment variable
# Replace webdevelopment.settings with your actual project.settings module
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "webdevelopment.settings") 
# Initialize Django so ORM and settings are available and the file can be run from the terminal
django.setup()


    

Third-party Imports

        
# Library for parsing HTML and extracting data
from bs4 import BeautifulSoup
# Algolia client for pushing records to search index
from algoliasearch.search.client import SearchClient
# Access Django settings to retrieve the security keys
from django.conf import settings
# Library for writing asynchronous code which is used with the async Algolia client
import asyncio
# Regex module, used in slugify helper
import re


    

Helper functions. I used one helper function to slugify my filenames and convert them to IDs so that can be tagged onto URLs correctly. This lets my code snippets and examples be able to be linked to directly on their pages rather than just the top of the page they appear on.

        
def slugify(value: str) -> str:
    """
    Converts a string into a slug suitable for use as an HTML element ID.
    Replaces all non-alphanumeric characters with hyphens, trims extras,
    and lowercases the result.
    """
    value = re.sub(r'[^a-zA-Z0-9]+', '-', value).strip('-')
    return value.lower()


    

Asynchronous function - set up. I included a lot of print statements throughout the function in order to help me debug later but these aren't strictly necessary. In the setup stage I want to define the function, create a docstring, and define my Algolia variables.

        
async def run():
    """
    Reads all HTML templates in multiple folders across your project,
    extracts titles and paragraphs, clears the Algolia index, and uploads them.
    """

    print("Starting indexer...")
    # Create Algolia client using your app ID and WRITE key from the settings.py
    client = SearchClient(settings.ALGOLIA_APP_ID, settings.ALGOLIA_WRITE_KEY)
    print("Algolia client initialized.")

    # Choose the index name. This will be the name of the index when it appears in your Algolia dashboard
    index_name = 'webdevgrad_pages'
    print(f"Index '{index_name}' selected.")

    # Clear the index before uploading new records
    print("Clearing existing records in the index...")
    response = await client.clear_objects(index_name)
    print(f"Clear index response: {response}")


    

Asynchronous function. Next I defined the templates I want to index.

        
# List all template folders. Here I commented out the main templates folder as all the pages I wanted to search were defined within their own apps.
# The path uses the BASE_DIR defined in the settings.py and tags on /'app_name'/'templates'
    templates_dirs = [
        # Path(settings.BASE_DIR) / 'templates',             # main templates folder
        Path(settings.BASE_DIR) / 'HTML' / 'templates',    # app1 templates
        Path(settings.BASE_DIR) / 'intro_to_full_stack' / 'templates',    # app2 templates
    ]
    print(f"Templates directories to scan: {templates_dirs}")


    

Asynchronous function. Initiate the records and loop through HTML files. For me this became quite complicated due to the nature of my file layout but should be easily simplified for your needs.

        
# Initiate an empty list of records
    records = []
    # Set the first object_id as 1
    object_id = 1

    # Loop through each templates folder and index all HTML files found
    # For x in the above defined template_dirs. I use templates_path as the temporary variable.
    for templates_path in templates_dirs:
	# A check for if the folder does not exist then skip it
        if not templates_path.exists():
            print(f"Warning: templates directory {templates_path} does not exist, skipping.")
            continue
	
        # For x in the templates_path
        # .rglob walks through all folders and subfolders
        # finds all files that end in .html
        for file_path in templates_path.rglob("*.html"):
            try:
	            # Try to open it in read mode
                with open(file_path, "r", encoding="utf-8") as f:
		            # Read the entire file contents as a single string
                    html = f.read()
	        # Throw an error if it cannot be read and continue to the next
            except Exception as e:
                print(f"Error reading {file_path}: {e}")
                continue

            # This breaks up the long string in the variable html using the html parser into something that can be queried
            # For instance if html had a title element you can access it with soup.title
            soup = BeautifulSoup(html, "html.parser")
            # Creates a title variable that takes the title element of the page and if it exists and makes it a string
            # Otherwise it takes the filename and removes the extension (.html)
            title = soup.title.string if soup.title else file_path.stem
            
            # Create a paragraphs variable
            # Loops through each p element in the soup
            # Gets the text and strips the element tags
            # Concatenates them all into one long string
            paragraphs = " ".join(p.get_text(strip=True) for p in soup.find_all("p"))

            # Set up the relative paths and directories to later be used to determine URL structure 
            # Path relative to template root
            relative_path = file_path.relative_to(templates_path)  
            # Parent directory of the file
            relative_dir = relative_path.parent  

            # Depending on the type of file, I want the URLs to be different
            # For example if it is a code snippet included on a page then the URL will end with #code-snippet-id
            # Case 1: The file is from Topic/SubTopic/includes/code_snippets or Topic/SubTopic/includes/code_examples
            # Check if code_snippets or code_examples is in the relative directory name
            if relative_dir.name in {"code_snippets", "code_examples"}:
                # Define the original section as the relative directory name
                original_section = relative_dir.name
                # The following works because I set the IDs of the sections in the form of e.g.
                # topic-subtopic-type-filename
                # For example for my code snippet of form_validation_snippet.html
                # HTML/forms/includes/code_snippets/ form_validation_snippet.html
                # Has the ID
                # HTML-forms-code-snippets-form-validation-snippet
                # The following code converts the file path to this ID
                # Convert underscores in the file path to hyphens for the ID as all my IDs use hyphens not underscores
                # The preserved part is the current directory name with underscores to hyphens
                # E.g. code-snippets
                preserved_part = relative_dir.name.replace("_", "-")  
                # Go up 2 levels  to get to the topic folder e.g. HTML/forms/includes/code_snippets/ gets to HTML/forms
                relative_dir = relative_dir.parent.parent
                # Replace any underscores with hyphens if there are in e.g. HTML/forms
                relative_url = relative_dir.as_posix().replace("_", "-")

                # A special case for the topic homepages. E.g. HTML/home
                # This works across the site because the homepages for each section are named “home”
                # E.g. HTML/home, CSS/home, flask_app/home
                if relative_url.endswith("home"):
                    # No trailing / so that IDs can be tagged on without errors
                    # E.g. website.com/HTML/home#ID1 
                    # Instead of the error causing: website.com/HTML/home/#ID1 
                    relative_url = relative_url  # no slash for home pages
                else:
                    # For non-home pages add a slash
                    relative_url.rstrip("/") + "/"  # Ensure only 1 slash

                # Define a directory slug variable that takes the relative_dir (that we defined as jumping up 2 levels)
                # .posix() converts it to a string with “/”
                # Replace any slashes with hyphens and any underscores with hyphens matching my ID notations
                # Add in an extra hyphen and then the preserved part (the original relative directory parent name)
                # E.g. HTML-forms + - + code_snippets = HTML-forms-code-snippets
                dir_slug = relative_dir.as_posix().replace("/", "-").replace("_", "-") + "-" + preserved_part

                # Create an element_id variable that combines the directory slug with the title (which was defined as the filename minus the extension) separated by a hyphen
                # Use the slugify helper function on the title
                # E.g. HTML-forms-code-snippets + form-validation-snippet = HTML-forms-code-snippets-form-validation-snippet
                element_id = f"{dir_slug}-{slugify(title)}"

                # Create a URL variable built from the above variables
                # When deployed, replace 127.0.0.1:8000 with the website URL
                url = f"http://127.0.0.1:8000/{relative_url}#{element_id}"

                # Define a section_type parameter
                if original_section == "code_snippets":
                    section_type = "Code Snippet"
                else:
                    section_type = "Code Example"

                    
                # Case 2: The file is from Topic/SubTopic/includes/
                # Because my file set up is Topic/SubTopic/subtopic.html
                # But each sections is included from
                # e.g. Topic/SubTopic/includes/section1.html
                # The logic is similar to Case 1

            elif relative_dir.name == "includes":
                relative_dir = relative_dir.parent
                relative_url = relative_dir.as_posix().replace("_", "-")

                # Special case for home pages
                if relative_url.endswith("home"):
                    relative_url = relative_url  # no slash for home pages
                else:
                    relative_url.rstrip("/") + "/"  # Ensure only 1 slash

	            # Slug the directory
                dir_slug = relative_dir.as_posix().replace("/", "-").replace("_", "-")

                element_id = f"{dir_slug}-{slugify(title)}"

                # Create the URL
	            # When deployed, replace 127.0.0.1:8000 with the website URL
                url = f"http://127.0.0.1:8000/{relative_url}#{element_id}"

	            # Set the section_type
                section_type = "Topic Section"

	        # In the else case, if it is not in the includes or code_snippets etc, it will be Topic page
            # E.g. HTML/forms/forms.html
            else:
                relative_url = relative_dir.as_posix().replace("_", "-")
	            # Special case for home page
                if relative_url.endswith("home"):
                    relative_url = relative_url  # no slash for home pages
                else:
                    relative_url.rstrip("/") + "/"  # Ensure only 1 slash

                dir_slug = relative_dir.as_posix().replace("/", "-").replace("_", "-")
	            # There is no element_id for the topic page as we want the url to point to the page, not the ID of an element on it
                element_id = "N/A"       	
	            # Define the URL
                # When deployed, replace 127.0.0.1:8000 with the website URL
                url = f"http://127.0.0.1:8000/{relative_url}"
	            # Set the section type variable
                section_type = "Topic Page"
	
	        # Create a section name variable
	        # Set it as “Other” as default
            section = "Other" 
            for templates_path in templates_dirs:
                try:
                    relative = file_path.relative_to(templates_path)
	                # Set the section to the top level folder name e.g. HTML, CSS etc.
                    section = relative.parts[0]
                    break
                except ValueError:
                    continue

            # Capitalize if not HTML or CSS
            if section not in {"HTML", "CSS"}:
                section = section.title()
	
	        # Here I created a ranking order based on the section type, so that I can give priority to the e.g. Topic page over a code snippet if you were to search “forms”
            sort_order = {
                "Topic Page": 0,
                "Topic Section": 1,
                "Code Example": 2,
                "Code Snippet": 3,
            }
            # I set the rank_priority variable to get the sort_order based on section_type and default to 99 if not found
            rank_priority = sort_order.get(section_type, 99)

	        # I create a pretty_title variable for a user friendly version of the title
            pretty_title = title.replace("_", " ").replace("-", " ").title()
            # Special case for Home pages
            if pretty_title == "Home":
                pretty_title = f"{section} Home"

            # Update Html, Css, Js to HTML CSS JS
            pretty_title = pretty_title.replace("Html", "HTML")
            pretty_title = pretty_title.replace("Css", "CSS")
            pretty_title = pretty_title.replace("Js", "JS")

            # Add in extra keywords so that HTML returns HTML home
	        # This fixes a bug where searching e.g. HTML would not show the HTML homepage
	        # Now the extra keywords appear in the search
            extra_keywords = []
	        # If the title ends with Home:
            if pretty_title.endswith("Home"):
	            # Add the section e.g. HTML, CSS etc into the keywords
                extra_keywords.append(section) 

            # Skip if the title is contents and append if it is not
	        # This is because I did not want each page's contents section to be returned in a search
            if title.lower() != "contents":
	            # Append the dictionary to the records
                records.append({
                    "objectID": object_id,
                    "file": str(file_path.relative_to(templates_path)),
                    "title": title,
                    "prettyTitle": pretty_title,
                    "elementID": element_id,
                    "sectionType": section_type,
                    "section": section,
                    "content": paragraphs,
                    "url": url,
                    "rankPriority": rank_priority,
                    "keywords": extra_keywords,
                })
                print(f"Prepared record {object_id}: {title} ({file_path})")
	            # Increment the object_id for the next file
                object_id += 1
            else:
                print(f"Skipping file {file_path} because title is 'contents'")

    print(f"Total records prepared to upload: {len(records)}")


    

Asynchronous function: Close the Algolia client.

        
# Closes the async Algolia client, freeing up network resources.
# Uses await because closing is also asynchronous.
    await client.close()
    print("Algolia client closed. Indexing complete.")

Asynchronous function: Add main function.

        
# This block ensures the script only runs when executed directly
# (not when imported as a module into another script).
if __name__ == "__main__":
    # Import asyncio here again (even though imported earlier) so this block is self-contained.
    import asyncio

    # Print a message to indicate the script is starting execution.
    print("Starting async indexer...")

    # Run the async `run()` function inside Python's event loop.
    # This triggers the indexing process defined above.
    asyncio.run(run())


    

Creating the View

For your search app view, 1 view needs to be defined that takes the search query and directs to the results page. We will need a check for if there is a query and if so, connect to the Algolia client with our app ID and search key, then create and return a list of results that will be passed to the context in order to display them on the rendered results page.

        
from django.shortcuts import render
# Import settings to access our APP ID and search keys
from django.conf import settings
# Import the synchronous Algolia search client to connect to our index
from algoliasearch.search.client import SearchClientSync

def search_view(request):
    # Extract the query parameter 'q' from the GET request.
    # request.GET is a dictionary-like object for URL query params (?q=html).
    # If 'q' is missing, default to an empty string ("").
    query = request.GET.get("q", "")
    # Initialise the results list
    results = []

	# Run the search if there is a query
    if query:

        try:
            # Initialize Algolia client with the APP ID and the search key
            client = SearchClientSync(settings.ALGOLIA_APP_ID, settings.ALGOLIA_SEARCH_KEY)

            # Perform the search
            # First argument is the index name ("webdevgrad_pages").
            # Second is a dictionary of search parameters (here: just the query string).
            res = client.search_single_index(
                "webdevgrad_pages",
                {
                    "query": query,
                }
            )

            # Get the hits (results)
            results = res.hits

        except Exception as e:
            # Optional: log error for debugging
            print("Algolia search error:", e)

    # Render the results page passing query as a variable query and results as a variable results
    return render(request, "search/results.html", {
        "query": query,
        "results": results,
    })


    

Creating the URL

A simple app level URL.

        
from django.urls import path
from . import views

urlpatterns = [
    path('', views.search_view, name="search"),
]

Include it into the project level URL patterns.

        
urlpatterns = [
	…,
	…,
    path('search/', include('search.urls')),
]

Search Template

In it's simplest form, the search functionality will be a form with the action of the search URL (which will trigger the search view) and a GET method. There will need to be an input for the user to type and a submit button.

        
<form action="{% url 'search' %}" method="get">
    <input type="text" name="q" placeholder="Search site content" value="{{ query|default:'' }}">
    <button type="submit">Search</button>
</form>

Results Template

The results page is quite simple too. We can access the variable in our view context with {{ query }} and the results with {{ results }}. This way we can show the user what they searched for and loop through the results in order to display the information. We use dot notation of the temporary variable in the for loop in order to access the information for each result that we defined when we appended the records in the indexer.

        
<h2>Search results for "{{ query }}"</h2>
    <section>
        {% if results %}
            {% for hit in results %}
        <!-- Link to the page -->
                <a href="{{ hit.url }}">
                    <div>
            <!-- Display the user friendly title  and section type-->
                        <h3>{{ hit.prettyTitle }}</h3>
                        <p> Section type: {{ hit.sectionType }}</p>
            <!-- Check if there is any content to display -->
                        {% if hit.content %}
            <!-- Show the content which is a combination of the paragraphs but limit it to 30 words -->
                            <p>Content: {{ hit.content|truncatewords:30 }}</p>
                        {% endif %}
                    </div>
                </a>
            {% endfor %}
        {% else %}
            <p>No results found.</p>
        {% endif %}
    </section>


    

Tweaking Results

The results can be tweaked from your Algolia dashboard. Simply go to Data Sources (the cylinder with 3 segments), then Indices, then click on the index. Here you can try out different searches to see what comes up. If you go to configuration from here, you can set what fields are searched by the user's query and also under ranking and sorting you can add custom rankings. This is where I rank it by rankPriority (ascending) so that the topic page is shown first in the results for this website, followed by the section and then code examples and snippets.