proboards_scraper.database

class proboards_scraper.database.Database(db_path)[source]

This class serves as an interface for the SQLite database, and allows items to be inserted/updated or queried using a variety of specific functions that abstract away implementation details of the database and its schema.

Parameters

db_path (pathlib.Path) – Path to SQLite database file.

insert(obj, filters=None, update=False)[source]

Query the database for an object of the given sqlalchemy Metaclass using the given filters to determine if it already exists in the database. If it doesn’t, insert it into the database. Either way, return a bool indicating whether the object was added, as well as the resulting object from the query.

Although this method can be called directly, it is preferable to call the corresponding insert_* or query_* wrapper methods instead, which simplify the task of querying/inserting into the database.

Parameters
  • obj (sqlalchemy.orm.decl_api.DeclarativeMeta) –

    A sqlalchemy Metaclass instance corresponding to a database table class, i.e., an instance of one of:

  • filters (Optional[dict]) – A dict of key/value pairs on which to filter the query results. The keys should correspond to the attributes of the Metaclass, i.e., attributes of the obj argument class. See example below. If filters is None, it defaults to the id attribute of obj, i.e., obj.id.

  • update (bool) – Whether to update the database entry if the queried object already exists.

Example

The following example demonstrates how to insert a new user into the database. Note that we first create an instance of the user (which is passed to insert()) and filter by the user id (which is the User table primary key). In other words, this searches the database for an existing user with the given filter (i.e., the user with id 7) and, if the user doesn’t exist, inserts it into the database, then returns the inserted object:

user_data = {
    "id": 7,
    "date_registered": 1631019126,
    "email": "foo@bar.com",
    "name": "Snake Plissken",
    "username": "snake",
}

new_user = User(**user_data)

db = Database("forum.db")
inserted, new_user = db.insert(
    new_user,
    filters={"id": 7}
)
Return type

Tuple[int, sqlalchemy.orm.decl_api.DeclarativeMeta]

Returns

(inserted, ret)

  • inserted:

    An integer code denoting insert status.

    • 0: The object failed to be inserted or updated.

    • 1: obj was inserted into the database.

    • 2: obj existed in the database and was updated.

  • ret:

    The inserted object, if the object didn’t previously exist in the database, or the existing object if it did already exist. It is effectively an updated version of obj.

insert_avatar(avatar_, update=False)[source]

Insert a user avatar into the database; this method wraps insert().

Parameters
  • avatar_ (dict) – A dict containing the keyword args (attributes) needed to instantiate a Avatar object.

  • update (bool) – See insert().

Return type

proboards_scraper.database.Avatar

Returns

The inserted (or updated) Avatar object.

insert_board(board_, update=False)[source]

Insert a board into the database; this method wraps insert().

Parameters
  • board_ (dict) – A dict containing the keyword args (attributes) needed to instantiate a Board object.

  • update (bool) – See insert().

Return type

proboards_scraper.database.Board

Returns

The inserted (or updated) Board object.

insert_category(category_, update=False)[source]

Insert a category into the database; this method wraps insert().

Parameters
  • category_ (dict) – A dict containing the keyword args (attributes) needed to instantiate a Category object.

  • update (bool) – See insert().

Return type

proboards_scraper.database.Category

Returns

The inserted (or updated) Category object.

insert_guest(guest_)[source]

Guest users are a special case of User. Guests are users who do not have a user id or a user profile page, and may include deleted users. Since there may be posts or threads started by guests, they are treated as normal users for the purposes of the database, except they are assigned a negative integer user id (which does not exist on the actual forum). Because a given guest has only a username and not a user id, guests are queried by name. If a guest does not already exist in the database, the next smallest negative integer is used as their user id.

Parameters

guest_ (dict) – A dict containing a name key, corresponding to the guest user’s name.

Return type

proboards_scraper.database.User

Returns

The inserted or existing User object corresponding to the guest.

insert_image(image_, update=False)[source]

Insert an image into the database; this method wraps insert().

Parameters
  • image_ (dict) – A dict containing the keyword args (attributes) needed to instantiate a Image object.

  • update (bool) – See insert().

Return type

proboards_scraper.database.Image

Returns

The inserted (or updated) Image object.

insert_moderator(moderator_, update=False)[source]

Insert a moderator into the database; this method wraps insert().

Parameters
  • moderator_ (dict) – A dict containing the keyword args (attributes) needed to instantiate a Moderator object.

  • update (bool) – See insert().

Return type

proboards_scraper.database.Moderator

Returns

The inserted (or updated) Moderator object.

insert_poll(poll_, update=False)[source]

Insert a poll into the database; this method wraps insert().

Parameters
  • poll_ (dict) – A dict containing the keyword args (attributes) needed to instantiate a Poll object.

  • update (bool) – See insert().

Return type

proboards_scraper.database.Poll

Returns

The inserted (or updated) Poll object.

insert_poll_option(poll_option_, update=False)[source]

Insert a poll option into the database; this method wraps insert().

Parameters
  • poll_option_ (dict) – A dict containing the keyword args (attributes) needed to instantiate a PollOption object.

  • update (bool) – See insert().

Return type

proboards_scraper.database.PollOption

Returns

The inserted (or updated) PollOption object.

insert_poll_voter(poll_voter_, update=False)[source]

Insert a poll voter into the database; this method wraps insert().

Parameters
  • poll_voter_ (dict) – A dict containing the keyword args (attributes) needed to instantiate a PollVoter object.

  • update (bool) – See insert().

Return type

proboards_scraper.database.PollVoter

Returns

The inserted (or updated) PollVoter object.

insert_post(post_, update=False)[source]

Insert a post into the database; this method wraps insert().

Parameters
  • post_ (dict) – A dict containing the keyword args (attributes) needed to instantiate a Post object.

  • update (bool) – See insert().

Return type

proboards_scraper.database.Post

Returns

The inserted (or updated) Post object.

insert_shoutbox_post(shoutbox_post_, update=False)[source]

Insert a shoutbox post into the database; this method wraps insert().

Parameters
  • shoutbox_post_ (dict) – A dict containing the keyword args (attributes) needed to instantiate a ShoutboxPost object.

  • update (bool) – See insert().

Return type

proboards_scraper.database.ShoutboxPost

Returns

The inserted (or updated) ShoutboxPost object.

insert_thread(thread_, update=False)[source]

Insert a thread into the database; this method wraps insert().

Parameters
  • thread_ (dict) – A dict containing the keyword args (attributes) needed to instantiate a Thread object.

  • update (bool) – See insert().

Return type

proboards_scraper.database.Thread

Returns

The inserted (or updated) Thread object.

insert_user(user_, update=False)[source]

Insert a user into the database; this method wraps insert().

Parameters
  • user_ (dict) – A dict containing the keyword args (attributes) needed to instantiate a User object.

  • update (bool) – See insert().

Return type

proboards_scraper.database.User

Returns

The inserted (or updated) User object.

query_boards(board_id=None)[source]

Return a list of all boards if no board_id is provided or a specific board if it is provided.

Parameters

board_id (Optional[int]) – A board id (optional).

Return type

Union[List[dict], dict]

Returns

A dict corresponding to a board in the database (if board_id was provided), else a list of dicts of all boards (if board_id was not provided).

Note

The returned Board object(s) are serialized to a human-readable JSON format (Python dict) by serialize().

query_threads(thread_id=None)[source]

Return a list of all threads if no thread_id is provided or a specific thread if it is provided.

Parameters

thread_id (Optional[int]) – A thread id (optional).

Return type

Union[List[dict], dict]

Returns

A dict corresponding to a thread in the database (if thread_id was provided), else a list of dicts of all threads (if thread_id was not provided).

Note

The returned Thread object(s) are serialized to a human-readable JSON format (Python dict) by serialize().

query_users(user_id=None)[source]

Return a list of all users, if no user_id is provided, or a specific user, if it is provided.

Parameters

user_id (Optional[int]) – A user id (optional).

Return type

Union[List[dict], dict]

Returns

A dict corresponding to a user in the database (if user_id was provided), else a list of dicts of all users (if user_id was not provided).

Note

The returned User object(s) are serialized to a human-readable JSON format (Python dict) by serialize().

class proboards_scraper.database.Avatar(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

This table links a user to their avatar.

image_id

Image id of the image that corresponds to this avatar; see Image.

Type

int

user_id

User id of the user to which this avatar belongs; see User.

Type

int

class proboards_scraper.database.Board(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

This table contains information on boards and their associated metadata.

id

Board number obtained from the board URL, eg, https://yoursite.proboards.com/board/42/general refers to the “General” board with id 42.

Type

int

category_id

Category id of the category to which this board belongs; see Category.

Type

int

description

Board description.

Type

str

name

Board name. Required.

Type

str

parent_id

Board id of this board’s parent board, if it is a sub-board.

Type

int

password_protected

Whether the board is password-protected.

Type

bool

url

Board URL.

Type

str

moderators

List of this board’s moderators, if any; see Moderator.

sub_boards

List of this board’s sub-boards, if any.

threads

List of this board’s threads; see Thread.

class proboards_scraper.database.Category(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

This table stores information on categories (on the main page) and their associated metadata.

id

Category id number obtained from the main page source.

Type

int

name

Category name.

Type

str

boards

List of boards belonging to this category; see Board.

class proboards_scraper.database.CSS(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

Table for storing information related to downloaded CSS files. The CSS files themselves should be stored on disk.

id

An arbitrary autoincrementing primary key for each CSS file.

Type

int

description

Description of the file.

Type

str

filename

Filename of the CSS file stored on disk.

Type

str

md5_hash

MD5 hash of the downloaded file.

Type

str

url

Original URL of the CSS file.

Type

str

class proboards_scraper.database.Image(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

This table stores generic metadata for any image. Image files themselves should be downloaded and stored somewhere; this table only records the filename of the downloaded file (which may differ from the original filename, found in the url). The table may also be used to store metadata on files that no longer exist, e.g., an avatar hosted on a site that no longer exists, as a record of the original URL.

id

An arbitrary autoincrementing primary key for each image.

Type

int

description

Description of the image. Optional.

Type

str

filename

Filename of the downloaded file on disk.

Type

str

md5_hash

MD5 hash of the downloaded file.

Type

str

size

Size, in bytes, of the downloaded file.

Type

int

url

Original URL of the file.

Type

str

See also

The Avatar table, which links an Image to a User.

class proboards_scraper.database.Moderator(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

This table links a user to a board they moderate. A given moderation relationship (i.e., board + user combination) must be unique.

board_id

Board id of the board the user moderates; see Board.

Type

int

user_id

User id of the moderator; see User.

Type

int

class proboards_scraper.database.Poll(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

This table stores information on a poll associated with a thread. Specifically, it links the poll id (which is the same as the thread id) to the options for the poll and the users who have voted in the poll.

id

The thread id to which this poll belongs; see Thread.

Type

int

name

Poll name, i.e., the poll question.

Type

str

options

List of options associated with this poll; see PollOption.

voters

List of users who have voted in this poll; see PollVoter.

class proboards_scraper.database.PollOption(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

This table stores the number of votes for a poll option. A poll option must have an associated poll. Note that poll option ids are unique across the entire site, so we don’t need to create an arbitrary autoincrementing primary key and can simply use the integer value found on the forum itself.

id

Poll option (answer) id obtained from scraping the site.

Type

int

poll_id

Poll id (aka, thread id) to which this option belongs; see Poll.

Type

int

name

Option name.

Type

str

votes

Number of votes this option received.

Type

int

class proboards_scraper.database.PollVoter(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

This table links a poll to users who have voted in the poll. Note that we can only see who has voted on a poll but not which option (PollOption) they voted for. Each row in the table corresponds to a unique poll/user combination, since a user can vote, at most, only once in a given poll.

poll_id

Poll id of the poll to which this user/vote belongs; see Poll.

Type

int

user_id

User id of the user/voter; see User.

Type

int

class proboards_scraper.database.Post(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

This table holds information for each post.

id

Post id, obtained from the forum. Every post on a forum has a unique integer id.

Type

int

date

When the post was made (Unix timestamp).

Type

int

edit_user_id

User id of the user who made the last edit, if any; see User. If never edited, this value will be null.

Type

int

last_edited

When the post was last edited (Unix timestamp), if at all. If never edited, this value will be null.

Type

int

message

Post content/message, including any raw HTML.

Type

str

thread_id

Thread id of the thread in which the post was made; see Thread.

Type

int

url

Original post URL.

Type

str

user_id

User id of the user who made the post; see User.

Type

int

class proboards_scraper.database.ShoutboxPost(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

This table holds information for each shoutbox post.

id

Autoincrementing primary key.

Type

int

date

When the post was made (Unix timestamp).

Type

int

message

Post content/message, including any HTML.

Type

str

user_id

User id of the user who made the post; see User.

Type

int

class proboards_scraper.database.Thread(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

This table stores information for each thread.

id

Thread id, obtained from the thread URL.

Type

int

announcement

Whether the thread is marked as an announcement.

Type

bool

board_id

Board id of the board in which the thread was made; see Board.

Type

int

locked

Whether the thread is locked.

Type

bool

title

Thread title.

Type

str

url

Original URL.

Type

str

sticky

Whether the thread is stickied.

Type

bool

user_id

User id of the user who started the thread; see User.

Type

int

views

Number of thread views.

Type

int

posts

A list of this thread’s posts; see Post.

class proboards_scraper.database.User(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

This table holds information on users obtained from their user profile.

id

User number obtained from the user’s profile URL, eg, https://yoursite.proboards.com/user/21 refers to the user with user id 21. A negative value indicates a “guest” or deleted user and does not refer to an actual user id.

Type

int

age

User age. Optional.

Type

int

birthdate

User birthdate string. Optional.

Type

str

date_registered

Unix timestamp.

Type

int

email

User email.

Type

str

instant_messengers

Optional; a string consisting of semicolon-delimited messenger_name:screen_name pairs, eg, "AIM:ssj_goku12;ICQ:12345;YIM:duffman20".

Type

str

gender

Optional (“Male”/”Female”/”Other”).

Type

str

group

Group/rank (eg, “Regular Membership”, “Global Moderator”).

Type

str

last_online

Unix timestamp.

Type

int

latest_status

User’s latest status. Optional.

Type

str

location

User location. Optional.

Type

str

name

Display name.

Type

str

post_count

Number of posts (scraped from the user’s profile page); to get the actual number of posts by the user in the database, use the posts attribute.

Type

int

signature

User signature. Optional.

Type

str

url

Original user profile page URL.

Type

str

username

Registration name.

Type

str

website

User website. Optional.

Type

str

website_url

User website URL. Optional.

Type

str

avatar

The avatar associated with the user; see Avatar.

posts

A list of posts by the user; see Post.

threads

A list of threads started by the user; see Thread.

proboards_scraper.database.serialize(obj)[source]

Helper function that recursively serializes a database table object (or list of objects) and returns them as Python dictionaries.

Parameters

obj (Union[sqlalchemy.orm.decl_api.DeclarativeMeta, list]) –

A sqlalchemy Metaclass instance, i.e., one of:

Returns: Serialized version of the object (or list of objects).

Return type

Union[dict, List[dict]]