OpenAustralia.org

Use clean/pretty URLs instead of nasty /debate/?id=2009-02-06.12.1 form

Details

  • Type: Improvement Improvement
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: Application
  • Labels:
    None

Activity

Hide
Nathanael Boehm added a comment -

I'm happy to work up a schema/model for clean URLs. Just assign the issue to me.

Show
Nathanael Boehm added a comment - I'm happy to work up a schema/model for clean URLs. Just assign the issue to me.
Hide
Matthew Landauer added a comment -

Thanks for offering!

Show
Matthew Landauer added a comment - Thanks for offering!
Hide
Kat Szuminska added a comment -

maybe one for hackfest?

Show
Kat Szuminska added a comment - maybe one for hackfest?
Hide
Nathanael Boehm added a comment -

Preliminary schema idea attached as JPEG of a mind map.

Date first, /YYYY/MM/DD/, then the house (senate or reps) then the type of the debate such as a bill reading then the topic such as environment, economy etc then a unique ID just in case there are multiple debates on the same day of the same type and topic.

Type and topic are parsed from the title of the debate. We will need to keep a very simply lookup taxonomy of types and terms with alternative terms to match on - and if the URL builder can't make a match on either it'll omit or just have "other" or something.

What do you think?

@kat - yep as discussed at the teleconference on the weekend, this is for hackfest but what to go in with some ideas so we have a headstart and make sure we get something implemented by the end of the day.

Cheers,

Nat

Show
Nathanael Boehm added a comment - Preliminary schema idea attached as JPEG of a mind map. Date first, /YYYY/MM/DD/, then the house (senate or reps) then the type of the debate such as a bill reading then the topic such as environment, economy etc then a unique ID just in case there are multiple debates on the same day of the same type and topic. Type and topic are parsed from the title of the debate. We will need to keep a very simply lookup taxonomy of types and terms with alternative terms to match on - and if the URL builder can't make a match on either it'll omit or just have "other" or something. What do you think? @kat - yep as discussed at the teleconference on the weekend, this is for hackfest but what to go in with some ideas so we have a headstart and make sure we get something implemented by the end of the day. Cheers, Nat
Hide
Nathanael Boehm added a comment -

BTW the type/topic taxonomy included in the mind map is only a basic preliminary, proof of concept to gauge the extent of the taxonomy and alternative terms lookup required. It is incomplete.

Show
Nathanael Boehm added a comment - BTW the type/topic taxonomy included in the mind map is only a basic preliminary, proof of concept to gauge the extent of the taxonomy and alternative terms lookup required. It is incomplete.
Hide
Matthew Landauer added a comment -

@Nat - could you please explain roughly how the parsing for titles -> type and titles -> topic would work. That's not entirely clear to me yet. Thanks!

Show
Matthew Landauer added a comment - @Nat - could you please explain roughly how the parsing for titles -> type and titles -> topic would work. That's not entirely clear to me yet. Thanks!
Hide
Matthew Landauer added a comment -

OA-162 was closed as a duplicate of this ticket

Show
Matthew Landauer added a comment - OA-162 was closed as a duplicate of this ticket
Hide
Nathanael Boehm added a comment - - edited

@Matthew We'll need to maintain a reference data table which preferably could be maintained through an admin web interface, which I don't believe OA currently has (but will probably need in future) so we can enter a list of root terms and alternative terms that translate to the root term.

For example, when the clean URL engine is processing, say:

"Carbon Pollution Reduction Scheme Bill 2009"

It picks up the term "pollution" as a match against alternative terms, finds the mapped root term "environment" and sets that as a the topic.

Another example:

"Nuclear Non-Proliferation"

Nuclear might be a match for "energy" as the root term or "defence" as a root term. Open to ideas on how to solve this conflict, but probably just a straightforward priority ranking so that if the parser picks up "energy" first it would then be overridden by "defence" later down the taxonomy as the parser continues to match against remaining root and alternative terms. Not sure if that's robust enough - would need to test once we've actually developed the taxonomy.

I'm guessing we'd have about 15 root words and a total of 50 alternative terms mapped to them.

More examples:

"Wetlands" would map to "environment"
"Nation building" would map to "infrastructure"
"Telecommunications Interception" would map to "law-enforcement"
"Customs Tariff" would map to "trade"

etc.

Anything that can't be mapped, the topic would be omitted. We'd keep an eye on that and continue entering alternative terms into the taxonomy (and even expanding the root words / topics list if required) to cater for more keywords in debate titles as we notice them.

Does that make sense?

The actual table would look something like:

alternative, root
pollution, environment
wetlands, environment
carbon, environment,
climate, environment
sorry day, indigenous
native land, indigenous
workplace, employment
union, employment

...

Show
Nathanael Boehm added a comment - - edited @Matthew We'll need to maintain a reference data table which preferably could be maintained through an admin web interface, which I don't believe OA currently has (but will probably need in future) so we can enter a list of root terms and alternative terms that translate to the root term. For example, when the clean URL engine is processing, say: "Carbon Pollution Reduction Scheme Bill 2009" It picks up the term "pollution" as a match against alternative terms, finds the mapped root term "environment" and sets that as a the topic. Another example: "Nuclear Non-Proliferation" Nuclear might be a match for "energy" as the root term or "defence" as a root term. Open to ideas on how to solve this conflict, but probably just a straightforward priority ranking so that if the parser picks up "energy" first it would then be overridden by "defence" later down the taxonomy as the parser continues to match against remaining root and alternative terms. Not sure if that's robust enough - would need to test once we've actually developed the taxonomy. I'm guessing we'd have about 15 root words and a total of 50 alternative terms mapped to them. More examples: "Wetlands" would map to "environment" "Nation building" would map to "infrastructure" "Telecommunications Interception" would map to "law-enforcement" "Customs Tariff" would map to "trade" etc. Anything that can't be mapped, the topic would be omitted. We'd keep an eye on that and continue entering alternative terms into the taxonomy (and even expanding the root words / topics list if required) to cater for more keywords in debate titles as we notice them. Does that make sense? The actual table would look something like: alternative, root pollution, environment wetlands, environment carbon, environment, climate, environment sorry day, indigenous native land, indigenous workplace, employment union, employment ...
Hide
Henare Degan added a comment -

Just had a look at how theyworkforyou.co.nz does it and they appear to do type/date/topic but I think I prefer your suggestion.

As far as how the topic's done, maybe it's worth a browse through twfynz's source for inspiration too.

Show
Henare Degan added a comment - Just had a look at how theyworkforyou.co.nz does it and they appear to do type/date/topic but I think I prefer your suggestion. As far as how the topic's done, maybe it's worth a browse through twfynz's source for inspiration too.

People

Vote (0)
Watch (0)

Dates

  • Due:
    Created:
    Updated: