National Association of REALTORS Center for REALTOR Technology PolicyPage Releases beyond 1.0.1

sacktoysΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 3 χρόνια και 10 μήνες)

78 εμφανίσεις

1
















National Association of REALTORS
®

Center for REALTOR
®

Technology

PolicyPage Release
s

beyond 1.0.1



James W Miller


millerjw@world.oberlin.edu



April

2006


2



Table of Contents

Table of Figures

................................
................................
................................
..............................

2

Introduction

................................
................................
................................
................................
.....

3

Requirements for PolicyPage beyond release 1.0.1

................................
................................
........

3

User Requirements

................................
................................
................................
......................

3

Technical Considerations

................................
................................
................................
............

4

Brief Analysis

of each Requirement

................................
................................
...............................

4

Spidering

................................
................................
................................
................................
.....

4

Search Dynamically

................................
................................
................................
....................

5

Install
without loss of data

................................
................................
................................
..........

6

Extract and convert data
................................
................................
................................
..............

6

Use MySQL

................................
................................
................................
................................

6

Pe
rform Reporting

................................
................................
................................
......................

6

Create More Complex Rules

................................
................................
................................
.......

6

Store additional information

................................
................................
................................
.......

7

Short Term Plan and Design

................................
................................
................................
...........

7

Plan for release 1.0.1 (57 hour budget


target May 5)

................................
..............................

7

Plan for release 1.0.2 (Targe
t to be determined)

................................
................................
.........

7

Resolution of Design Issues for Release 1.0.1

................................
................................
............

7

Appendix A


Requirements from Mark Flavin of Bayeast

................................
.........................

11

Appendix B


Approach to implementing MySQL support

................................
.........................

12

Appendix C


Persistent Storage Requirements

................................
................................
...........

14



Table of Figures

Figure 1
-

Policy Page Entity Relationship Diagram

................................
................................
......

9

Figure 2
-

PolicyPage Data Structure Diagram
................................
................................
.............

10

3


Introduction

Policy Page Release 1.0 has been released. There are two purposes for this document. The first
is to explore requirements and list possible implementation approaches for future releases. The
major areas covered are
spidering, dynamic addressing, database structure and support, and
installation enhancements. The second purpose is to set scope for the next release and document
database

issues and resolutions required to begin implementation.

Requirements for PolicyPage

beyond release 1.0.1

Requirements are listed here with a few high level design questions. Each requirement is
explored in further detail later in the document.


User
R
equirements

Users of Policy Page
want

to:

1.

Spider:
Automatically process more than one
page per web site (spider to lower levels on
the site)
. Questions to be explored include how many levels should be scanned and
should all rules be applied to every page found in the spider’s path. For example, if one
of the 200 pages examined is missing
a copyright message, should the site fail?


2.

Search Dynamically:
Retrieve one or more listings and evaluate the retrieved results
against a set of PolicyPage rules. This has been called “dynamic addressing” because the
address of the page is not fixed and
known ahead of time. It may be necessary for the
user of a site to specify some search criteria and then the site generates the address to be
used to retrieve the results. The main question to be explored is how will the information
necessary to generate

the dynamic retrieval be specified to PolicyP
age? Do the sites
being tested know they are being tested? If so, will they be willing to provide the
information necessary to test their site such as the location of a form to be filled out and
the values to

provide to that form?


3.

Install without loss of data
:
Upgrade to a new release of PolicyPage without
inadvertently destroying existing data. The existing Windows automatic install process
must be changed to avoid this problem. Data installation is a se
parate step in the manual
install processes

but it is possible for a user to inadvertently run this separate process and
destroy most existing PolicyPage data.



4.

Extract and convert data:
Extract data to move it from one database to another or from
one dat
abase manager to another. Also, extract data from diagnostic purposes such as
sending existing definitions for Web site, Profiles, and Rules to CRT.

To avoid
implementing Extract and Load facilities more than once, it would be a good idea to put
Rules and

Profiles into the PolicyPage database first.


5.

Use MySQL:
Use the MySQL database instead of the SQLite database. Some users
already have this database installed or prefer it to SQLite. Because PolicyPage uses a
4

DBMS independent access layer (ADODB) to pe
rform database I/O, it is not a difficult
programming task to add support for MySQL. Installation and documentation issues
must also be addressed.


6.

Perform reporting:
Generate management reports and graphs based on the number of
sites that pass and fail w
ithin a range of dates. Also report which rules cause the most
failures. See Appendix A for a document provided by Mark Flavin of Bayeast.


7.

Create more complex rules:
See Appendix A for a list of possible rules. An example
of a more complex rule is “Ma
ke sure there is no phone number in the remarks section”.


8.

Save additional relevant information with the
persisted

results:
.
Information such as
a copy of the web page evaluated and information that can be obtained from the WHOIS
command should be saved fo
r future reference.
The original documentation for this
requirement from Keith Garner is attached in Appendix C.




Technical
C
onsiderations

The implementation of spidering is a major enhancement involving significant code additions to
the exi
sting web sit
e evaluation logic
.
Therefore, it is
a good time to evaluate the language
environment that will be used for web site evaluation.


A small project can be undertaken to see what code exists in each of the candidate language
environments that we could lever
age during the implementation. Language environments that
we have most seriously discussed so far include PHP and Ruby. This project can be done
concurrently with the development of release 1.0.1.


Brief Analysis of each Requirement

Spidering

The initial

implementation of spidering has been scoped to provide most of the functionality we
believe PolicyPage users want from a spidering feature. We believe that most users want to
check many listings by specifying a single starting page.


PolicyPage will be e
nhanced to allow the user to specify for any web the name of a link to be
followed from the staring page. PolicyPage then will follow each occurrence of that link for one
level and process the pages it finds using all the rules for the site.


For example i
f you specify the site
http://hbrhomes.com/search_results.cfm

as the starting point
and a link name of “Learn More …”, PolicyPage would process all 10 listings on this page and
apply the rules specifi
ed for the
http://hbrhomes.com/search_results.cfm

site. As example, each
of the 10 listings could be checked for the presence of the Max Internet Data Exchange logo and
the text string “This informat
ion is provided by the Bay East MLS”.

5


On some sites, it will be necessary to locate a suitable starting page and the copy the URL of that
page into PolicyPage. For example, at californiamoves.com, one possible starting point would
be
http://www.californiamoves.com/property/propertyresults.aspx?rpp=10&sort1=listprice&mls=&t
humbs=1&PType=SFAM&page=1&MinBed=0&CommunityIDList=&MaxPrice=999999000&
CityIDList=m1042&sortord=D&sort2=city&Zip=92109&Street=&sqft=0&PropSearch=0&Cou
nt
yIDList=&MinPrice=0&MinBath=0
.


From this starting point, the link “View Details” would be specified and the PolicyPage will
apply the rules for the site to each of the 10 listings shown on the page.


The major assumption in this design is that the Policy
Page user will not want to check thousands
and thousands of listings but will instead want to select a small sample of listings to be tested.




Search Dynamically

This design discussion uses the Coldwell Banker San Diego site to explore what can be specif
ied
as a fixed address and what must be generated dynamically.


In this example, there are four steps. Each step shows the URL that
the system generates based
on

the data entered in the step


First
--

fill in
zip code
92019

http://www.cbsocal.com/property
/propertySearchCCZ.aspx?PropSearch=0&sort1=listprice&sort
ord=D&rpp=10&qstextccz=92109&page=1


second
--

check Pacific Beach

http://www.cbsocal.com/property/propertySearch.aspx?page=1&CityIDList=m1042&PropSearc
h=0&Zip=92109&rpp=10&sort1=listprice&sortord=D&
FromPropList=1


third
--

specif
y

price of 500,000
-

1,000,000

http://www.cbsocal.com/property/propertyresults.aspx?rpp=10&sort1=listprice&mls=&thumbs=
1&PType=SFAM&page=1&MinBed=0&CommunityIDList=&MaxPrice=1000000&CityIDList
=m1042&sortord=D&sort2=city&Zip=9
2109&Street=&sqft=0&PropSearch=0&CountyIDList=
&MinPrice=500000&MinBath=0


fourth
--

Click view Details on the first listing

http://www.cbsocal.com/property/propertydetails.aspx?propertyguid=41c3a639
-
4f42
-
4abe
-
aa4f
-
afa78bc1a273&rpp=10&sort1=listprice&mls=&t
humbs=1&PType=SFAM&page=1&MinBed=0
&CommunityIDList=&MaxPrice=1000000&CityIDList=m1042&sortord=D&sort2=city&Zip=9
2109&Street=&sqft=0&PropSearch=0&CountyIDList=&MinPrice=500000&MinBath=0


6

All addresses are predictable until the fourth step. For example, yo
u could enter the third URL
directly into PolicyPage and be fairly certain that it would produce a page with a number of
listing summaries. It is then necessary to click on one of the many listings to get listing details.
The details page contains the in
formation that would normally be checked by PolicyPage such as
the presence of “Information deemed to be reliable” and “MLS Source: SANDICOR”.

In this
example, the specific property (propertyguid=41c3a639
-
4f42
-
4abe
-
aa4f
-
afa78bc1a273) cannot
be specified d
irectly to PolicyPage. The property will eventually be sold and will disappear from
the database. However, it is likely that Pacific Beach and Zip Code 92109 will
always
be in the
Database.


Initial design proposal:

The functionality that might be requ
ired here can be achieved by using the design already
specified under “spidering” if you specify the third URL and request that Policy Page follow the
“View details” links.




Install without loss of data

The automatic PolicyPage install must be modified
to not overlay existing user data. There is no
problem with the current Linux approach because each release installs into a different directory.


Extract and convert data

In a future release, it would be useful to have an extract facility that extracts to

a readable format
such as XML or even comma delimited format. This would be useful in converting from SQLite
to MySQL databases.



Use MySQL

There have been requests to use MySQL. Since PolicyPage already uses the ADODB
abstraction layer between the app
lication code and the DBMS, this would not be difficult.
Appendix B contains some details.


Perform Reporting

Release 0.9.8 added a column in the Rundetails table that indicates whether a site passed of
failed. This can be used for summary reporting with
out the need to instantiate the object that
resides in the OBJECT_DATA column of the Rundetails table.


Create More Complex Rules

No analysis has been performed on this item. It would be desirable to convert Rules and P
rofiles
to database tables before an
y changes are made to design of Rule and Profile information.

7


Store additional information

PolicyPage currently persists information on which rules succeed and fail when a web site is
evaluated. PolicyPage also saves a link to the page evaluated. Howeve
r, when users come back
at a later time, the original web page could have changed and PolicyPage does not store a copy
of the page as it exists when the evaluation occurs. The page should be cached for future
reference. Also, there is other information t
hat is available at the time of web site evaluation that
could be cached such as the results returned by the WHOIS command. This information can also
be persisted in the resultdetails table.


Short Term Plan and Design

Plan for release 1.0.1 (
57

hour bu
dget


target May 5)

1.

Add the data items required in the URL entity to support 1 level of spidering.

2.

Update the user interface to support these new items.

3.

Upgrade the install

a.

Backup existing data

b.

Avoid installing on top of existing data

4.

Develop application
logic to detect and upgrade old databases (but not pre
-
release 1)


Plan for release 1.0.2 (Target
to be determined
)

1.

Implement the spidering logic

2.

Implement basic reporting (count of pass/fail within a range of dates)

3.

Cache the evaluated page in the resultd
etails table

4.

Add the data items required to the resultdetails table to facilitate reporting

5.

Possibly implement performance improvement to avoid re
-
work for each rule.



Resolution of
Database

Issues for Release 1.0.1

The existing database design did not co
nsider spidering. This is resolved by defining a

new tree
of entities

to store information related to spidering and dynamic search. The entity type that
implements the tree will be named “Links” and will be stored in the PolicyPage database. The
columns

of this new entity

the database will be:


Column

Data
Type

Description

LINK_ID

Integer

System assigned sequential number to uniquely identify this
link.

OWNER_TYPE

Character

The type of entity that owns this link. Values can be “U”
for URL or “L” for a
湯瑨e爠汩湫⸠⁓楮捥⁷ ⁡湴nc楰a瑥⁴桡琠
浯獴⁵獥牳⁷楬氠景汬l眠潮wy⁡⁳楮g汥楮欬潳琠潦瑥渠瑨攠
8

OWNER_TYPE will be “U” and the OWNER_ID will be
瑨攠toi⁴桡琠楮 瑩a瑥猠瑨t⁳ a牣栮

佗久o_fa

f湴n来r

C潮瑡楮猠o桥 f䐠a映瑨f⁕oi爠瑨 䱉k䬠瑨h琠潷t猠瑨s猠
䱉kh
.

䱉k䭟呏_cl䱌佗

噡牣har

C潮瑡楮猠o桥 浥m⁴桥 湫⁴桡琠睩汬⁢ ⁦潬汯睥搠獵d栠h猠
“View Details”. Usually, this name will occur multiple
瑩浥猠潮⁴桥⁕ i爠瑨 i楮欠i桡琠潷湳⁴桩猠䱉k䬮†
䕸a浰me㨠W桥睮楮 ⁕剌⁲e晥牳⁴漠r⁰ 来⁴桡琠 o湴慩湳⁡
汩湫猠
瑯′㔠t楳i楮i献†䅬s⁴桥獥 湫猠n牥⁩摥湴楦 e搠睩瑨⁴桥
character string “View Details”.

mo但䥌b

噡牣har

C畲ue湴⁣潮瑡楮猠i桥⁦楬 湡浥m⁴桥⁳ 煵e湴na氠摩獫⁦楬攠瑨s琠
摥晩湥猠瑨攠g牯異ro映f畬敳⁴桡琠睩汬⁢ ⁵獥搠瑯⁰牯de獳⁴桩猠
汩湫⸠f映瑨楳⁩猠汥s琠扬
a湫Ⱐm潬ocymage⁷楬 ⁴牡癥牳r a汬⁴桥
潷湥牳⁵湴楬⁡⁐o但䥌䔠楳⁦潵湤i



Database Issues for Release 1.0.2


The existing database

needs to provide more fixed columns in the resultdetails table so that
summary reporting and analysis can be performed withou
t instantiating the complete results
object.

These columns will be specified after a detailed analysis of the reporting and caching
requirements.







9


Figure
1

-

Policy Page Entity Relationship Diagram


10


Figure
2

-

PolicyPage Data Structure Diagram

11

Appendix A


Requirements from Mark Flavin of Bayeast


This is what my IDX compliance team has identified as their top priorities in enhancing policy
page


1.

Reporting

a.

Graphical report by compliance number for
total sites reviewed

b.

Graphical report by groups of sites

c.

Graphical reports of top rules being violated

d.

List of steps detailing compliance on per site basis

e.

Exportable to CSV to create more detailed reports for broker and or council
reporting

2.

Spidering

a.

The
spidering should be dynamic meaning that you should be able set the number
of links deep from the starting page

b.

The sample size should be random so each time it was run a random number of
properties were reviewed i.e. 1
-
100 with the max being configurable

in
preferences

c.

The spider should be trainable i.e. The Idx compliance agent should be able to
identify the links that should be spidered

d.

The spider should be able to work if the site content is embedded in a frame

3.

The Rules Creation Process

a.

For example If

this picture doesn’t exist then check for this picture

b.

Allow for comparison of one picture to another picture vice being dependant on
the file name

c.

Checking for specific MLS numbers or addresses to make sure the website is
updated regularly (i.e. Do a sea
rch for a property number or address search)

d.

Make sure specific content is not mentioned based on the presence of other
criteria nearby i.e. no phone numbers in the remarks section


12

Appendix B


Approach to implementing MySQL support


This is taken from a

response that Todd Costigan emailed on March 20 to a user considering
using PolicyPage with MySQL:


There have been questions about using MySQL with PolicyPage instead of SQLite.


This documents the changes we feel are necessary to utilize My
SQL.

These in
structions involve

the database used by PolicyPage.

This requires technical assistance.



Using MySQL with PolicyPage.

PolicyPage was coded in anticipation of adding support for databases other than SQLite.

There is (untested) logic already in the code to
handle MySQL.

However this logic is untested but we wanted to make the changes known


Changing the constant BRAND and the connection parameters should be the changes required.

The other major step is to move the data from the SQLite database and load it i
nto MySQL
database.


Here is the information needed to make the coding changes.


The purpose of the constant BRAND is to allow different code based on what database is being
used.

This constant was used only a few times in anticipation of adding support f
or MySQL.

It is anticipated that it will be used more frequently once this support is added.


In controllerdef.php change

define("BRAND",


SQLITE_BRAND); to define("BRAND", MYSQL_BRAND);


Also in controller.def, change that database variables as required

define("DATABASE", DATABASE_DIRECTORY . "/ppage.db");

define("DATABASE_USER","ppageuser");

define("DATABASE_PASSWORD","pppassword");

define("DATABASE_HOST","localhost");


Database connections are made in three places.

It is believed that changing the conn
ections in these three places is all that is required.

Here is where to look (line numbers approximate)


Searching for: ADONewConnection(BRAND)

common
\
class.user.php(69): $this
-
>con1 = ADONewConnection(BRAND);

common
\
model.php(194): $this
-
>con1 = ADONewCon
nection(BRAND);

devscripts
\
test.php(8): $con1 = ADONewConnection(BRAND);

13

devscripts
\
test.php(30): $con1 = ADONewConnection(BRAND);


model.php does almost all database I/O in PolicyPage


class.user.php was designed to be used in login.


test.php is designed

to prove that the database connection works.


model.php: This may work as currently coded since there is already logic to test BRAND.

If BRAND is
MySQL
, a
MySQL

connection is attempted.


This has not been tested.


class.user.php:


There is also coding her
e to test for BRAND.


Important

However, by inspecting the code you can see that the MySQL code will not work.

"password" is misspelled as "passwrod" in the MySQL connection string and this must be fixed.


test.php:


You will need to move this from the dev
scripts directory to the main directory.


Test.php assumes that it resides in the main directory.


Lines 5
-
23 were used in debugging unrelated to database testing and should be removed.


Change the connection logic at around existing lines 29
-
30 for MySQL


Test.php can then be used to verify that the MySQL connection is working.


At some point, test.php can be enhanced to support multiple databases by testing the BRAND
constant.


14

Appendix C


Persistent Storage Requirements




-----
Original Message
-----

Fr
om: Keith T. Garner [mailto:kgarner@crt.realtors.org]

Sent: Tuesday, April 04, 2006 2:14 PM

To: Todd Costigan; Jim Miller

Cc: Mark A. Lesswing

Subject: PolicyPage ideas



o A page cache. It'd be a good idea to cache the page at the time we scan

it and ti
e it to the results somehow. This was if someone is sighted for

being out of compliance, and deny it, the scanning entity has proof of

infraction.



o Any information on the site we can get from public sources. I'm not sure

why we'd want these yet, but m
y gut tells me its a good idea to track the

following:


* IP


* whois information


* ARIN information


* Others i'm not currently thinking of


Anyway, I wanted to document these before I forgot.


Keith


--



Keith T. Garner
-

Strategic Architect
-

Center for REALTOR® Technology


kgarner@realtors.org
-

312
-
329
-
3294
-

http://blog.realtors.org/crt